Fix FreeWebNovel extractor for new AJAX chapter pagination system#2747
Conversation
This should resolve dteviot#2745
…iting 20+ minutes for content to donwload, only for the actual epub building to fail because dompurify is missing.
…into ExperimentalTabMode
|
Ok, minor additional tweak. I added a pre-start check for DOMPurify and zip.js. Mostly because you can in fact install the plugin from a git clone without those dependencies, and if you do so it works fine..... right up until the last moment where it fails to generate epub files. Which is nice if you're fetching a series with 900+ chapters. At one chapter every 25 seconds. Anyways, now it pops a warning if they're not present and loaded. |
dteviot
left a comment
There was a problem hiding this comment.
Firstly, thank you very much for your hard work.
I'd love to accept this as it stands and I'd fix the minor issues myself. But the innerHTML is a killer. That said, if you'd like me to try and fix that one, I'll have a go.
| let parent = tNode.parentNode; | ||
| if (parent) { | ||
| let tempSpan = content.ownerDocument.createElement("span"); | ||
| tempSpan.innerHTML = tNode.nodeValue; |
There was a problem hiding this comment.
You can't do assignment to innerHTML. Extensions will be rejected for doing this.
| removeUnwantedElementsFromContentElement(content) { | ||
| // Remove ads injected by third-party ad networks (such as SSP ads and PubFuture networks) | ||
| // whose div IDs start with 'bg-ssp-' or 'pf-' | ||
| util.removeChildElementsMatchingSelector(content, "div[id^='bg-ssp-']"); |
There was a problem hiding this comment.
You can combine the selectors with a comma. e.g.
util.removeChildElementsMatchingSelector(content, "div[id^='bg-ssp-' ], div[id^='pf-']");
| let url = `${baseNovelUrl}?ajax=chapters&page=${page}`; | ||
| try { | ||
| let response = await HttpClient.fetchJson(url); | ||
| if (response && response.json && response.json.code === 200 && response.json.html) { |
There was a problem hiding this comment.
Use null chaining operator
if (response?.json?.code === 200
| "manifest_version": 3, | ||
| "name": "WebToEpub", | ||
| "version": "1.0.12.75", | ||
| "version": "1.0.12.77", |
There was a problem hiding this comment.
You should not be bumping this. Build process does it. But it won't hurt.
There was a problem hiding this comment.
The build process did this. And then pushed it to my branch. Somehow.
0cda6a0 and afa8d0f are both by github-actions[bot]
…into ExperimentalTabMode
|
Ok, I think that should address all the feedback. Of course, now I expect the bot will bump the version again |
This should fix the issue with FreeWebNovel.com now paginating their chapter listing.
This also includes a couple of other assorted tweaks, mostly related to content cleanup.
The source site inserts "FreeWebnovel.com" in the text a bunch using unicode math alternative chars. NFKD normalization detects that, and they are stripped (including one case where the markup is
reewebnovel.com. Typos in watermarks! Also lol, watermarking content the site has itself stolen)!There's also some cleanup of advertisements, as well as fixing some broken source content, where HTML tags got accidentally escaped somehow. (This site is a pirate aggregator, and apparently their scraper for the actual source sites is slightly broken in some cases)
Anyways, tests pass, and the couple novels I experimentally fetched all seem to work now.
Disclosure: I used LLM tools to generate this, though I did read the actual scraper JS file pretty carefully and made some edits. Less so much the tests (though they do pass!).
Should fix #2745