Skip to content

Cleaning and compression of raw html #44

Answered by xhluca
Winsome-A asked this question in Q&A
Discussion options

You must be logged in to vote

I would like to ask how I can utilize part of your work to clean and compress a raw html file to get a new compressed html file.

Although you may find the library useful for cleaning html files, it is not the primary goal of this library; rather, our goal is to process the html files so they can be ingested by LLM models for predicting web actions.

if those python libraries are utilized, would this involve the selection of candidates, that is, keeping the content corresponding to the candidate? Wouldn't that require artificially setting the candidate in advance?

You can use the DMR retriever to dynamically find relevant candidate, given a context (action history similar to those of we…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by xhluca
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #42 on September 27, 2024 15:53.