Switch between RSS content and Web page content (store both) #1999

danielyrovas · 2023-08-02T07:04:39Z

I have tested my changes
- currently fails language test, as I added a line to the English language JSON file
I read this document: https://miniflux.app/faq.html#pull-request

Please let me know what you think of this change.
It solves an issue that has irked me when using the interface, which is overwriting the original RSS content with the webpage content which sometimes is not as well formatted as the original RSS content or is empty/junk. It allows people to hit the download button just to see what the page looks like when it is downloaded, but still return to the RSS page to continue reading.

The feature works like this:
Whenever an entry is loaded, if there is downloaded content available, that will be displayed and the Download button becomes Show RSS Content.

Otherwise the page loads like normal, showing the Download button and the RSS page content.

If the Show RSS Content button is selected, the original RSS content is displayed and the button turns to the regular Download button.

If this Download button is selected, the page is processed and saved like normal (re-downloading it even if it has been saved before).

The /fetch-content API method now returns both content and web_content as JSON.

danielyrovas · 2023-08-02T09:20:57Z

reader/processor/processor.go

@@ -80,14 +80,15 @@ func ProcessFeedEntries(store *storage.Storage, feed *model.Feed, user *model.Us
 				logger.Error(`[Processor] Unable to crawl this entry: %q => %v`, entry.URL, scraperErr)
 			} else if content != "" {
 				// We replace the entry content only if the scraper doesn't return any error.
-				entry.Content = content
+				// TODO: document change
+				entry.WebContent = content
 			}
 		}

 		rewrite.Rewriter(url, entry, feed.RewriteRules)


This is problematic, I didn't look at how rewriter gets its data - I'll add a commit which fixes this.

Rewriter now operates on web_content - as far as I can see rules do not operate on RSS content - only when processing original web page content.

danielyrovas · 2023-08-13T02:37:44Z

I merged the commits and rebased onto main.

fguillot · 2023-08-15T04:22:51Z

Looks like the unit tests are failing after your rebase

danielyrovas · 2023-08-15T04:38:52Z

Yeah, I'll need to change some of those tests I think. The TestRewriteWithYoutubeLink checks the Content field instead of the WebContent field. I imagine it is similar for the other failing ones.

internal/api/entry.go

internal/ui/entry_scraper.go

danielyrovas · 2024-03-28T10:21:00Z

Hi @fguillot, I've updated my pull request. Do you mind taking a look at it?

fguillot

Why should the rewrite rules apply only to WebContent? They should apply to both Content and WebContent, especially for feeds that provide full HTML content by default.

oldherl · 2024-08-09T08:00:30Z

Any progress on this?

wolfhechel · 2024-08-28T06:16:41Z

This to me seems to be changing the way the current v1/fetch-content endpoint works. The expectation is that the returned content key will be the web content, as such it would be breaking compatibility with current clients such as my own.

Surely, such a change should be prefixed to a bumped version - otherwise, what's the point of the v1 prefix at all?

Also I don't see why fetch-content needs to be changed at all since the purpose of it is to return the web content. An easy solution would be to switch web_content to be content as before and return entry_content for the original content.

Clients are generally okay with additional API information, not so much with removed or changed.

danielyrovas commented Aug 2, 2023

View reviewed changes

fguillot added the conflicts detected / rebase required label Aug 11, 2023

danielyrovas force-pushed the dev-keep-rss-content-and-web-content branch from 739e6e2 to 93948d7 Compare August 13, 2023 02:35

danielyrovas force-pushed the dev-keep-rss-content-and-web-content branch 2 times, most recently from 77b36eb to d00bcfc Compare August 18, 2023 05:16

fguillot removed the conflicts detected / rebase required label Sep 1, 2023

fguillot reviewed Sep 1, 2023

View reviewed changes

internal/api/entry.go Outdated Show resolved Hide resolved

danielyrovas force-pushed the dev-keep-rss-content-and-web-content branch 3 times, most recently from f397706 to 34f4898 Compare September 1, 2023 04:53

fguillot reviewed Sep 2, 2023

View reviewed changes

internal/ui/entry_scraper.go Outdated Show resolved Hide resolved

danielyrovas force-pushed the dev-keep-rss-content-and-web-content branch from 4664e37 to 628dc2a Compare September 8, 2023 01:32

fguillot added the conflicts detected / rebase required label Mar 5, 2024

fguillot mentioned this pull request Mar 18, 2024

RSS summary can be restored when downloading the full text fails #2525

Open

1 task

keep rss & web content | rebase

29ee5ec

danielyrovas force-pushed the dev-keep-rss-content-and-web-content branch from 628dc2a to 29ee5ec Compare March 28, 2024 10:18

fguillot removed the conflicts detected / rebase required label Mar 29, 2024

fguillot reviewed Apr 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch between RSS content and Web page content (store both) #1999

Switch between RSS content and Web page content (store both) #1999

danielyrovas commented Aug 2, 2023 •

edited

Loading

danielyrovas Aug 2, 2023

danielyrovas Aug 4, 2023

danielyrovas commented Aug 13, 2023

fguillot commented Aug 15, 2023

danielyrovas commented Aug 15, 2023

danielyrovas commented Mar 28, 2024

fguillot left a comment

oldherl commented Aug 9, 2024

wolfhechel commented Aug 28, 2024

Switch between RSS content and Web page content (store both) #1999

Are you sure you want to change the base?

Switch between RSS content and Web page content (store both) #1999

Conversation

danielyrovas commented Aug 2, 2023 • edited Loading

danielyrovas Aug 2, 2023

Choose a reason for hiding this comment

danielyrovas Aug 4, 2023

Choose a reason for hiding this comment

danielyrovas commented Aug 13, 2023

fguillot commented Aug 15, 2023

danielyrovas commented Aug 15, 2023

danielyrovas commented Mar 28, 2024

fguillot left a comment

Choose a reason for hiding this comment

oldherl commented Aug 9, 2024

wolfhechel commented Aug 28, 2024

danielyrovas commented Aug 2, 2023 •

edited

Loading