-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Android Authority feed #614
Comments
I also noticed that the embedded YouTube videos are stripped out, but unlike the headers, they seem oddly integrated in the HTML as they only appear in the JSON data. I understand if this can't be improved in the parser. "TED Tumblewords" app section: "Carrion" app section: |
Thanks for all the details. The missing headers should be simple. I'll see what I can do for that YouTube video JSON. |
I have a custom parser I'll test a little more before adding to the next release (jocmp/mercury-parser#27). I haven't found a way to grab the YouTube videos - it looks like they're doing something weird with JavaScript for non-browser clients (like Capy). That said, headers are working. Stay tuned!
|
Sounds promising! Thank you! If you made a custom parser for this feed specifically, is it because you don't consider header elements to be relevant to the content for most feeds? |
I think headers are important. I want to avoid breaking the core parser since I don't fully understand all the different parts yet. So adding a custom fix is easiest now. In general sites can misuse headers which is why the previous maintainers built the parser that way. Luckily they left a note on why the built it that way.
|
Updated as of 2024.12.1085-dev |
So far, articles seem rendered properly with the update to the parser! Thank you so much! It's a feed that I only check occasionally however so I'll let you know if I notice other things down the line. One small thing, albeit not important;
Here's the HTML for that section: No special element or meaningful class names that would allow you to tatget these polls in the custom parser, but after comparing with some of their other articles that contain polls, it appears that the classes Should these be stripped out? Or maybe it could also be useful to the reader to know that there's actually a poll there, I'm not sure. |
@PhilC813 good find. Let me see if they're using JS for these. It would be nice to allow this functionality. If nothing else, I'll remove it to avoid the jarring markup. |
In this article: The title of the apps in display are stripped out in Capy. Could this be improved? There's also another example of a poll when you'll need to confirm your results with #699! |
yep! I'll roll that in with the polls since the also use Here's a preview from my markup tester. "Google Gemini" and "Mozilla Thunderbird" were previously hidden. |
@PhilC813 the heading and poll updates are available as of 2025.01.1096-dev! |
Feed URL
https://www.androidauthority.com/feed/
Add any details, links, or screenshots about the article layout that's missing or wrong
In the following article, the name of the different sections, which are headers ("h2" elements), are stripped out in full content mode.
Article:
https://www.androidauthority.com/new-android-apps-658839/
Here's the stripped out HTML, which is rather simple:
<h2 id="[number]">[Header]</h2>
I would assume that the same would occur for other articles in the feed.
Unless I'm mistaken, header elements shouldn't be stripped out regardless of the feed; they are usually relevant to the content.
Thank you!!
The text was updated successfully, but these errors were encountered: