You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was doing a quick lit review to see if/how the state-of-the-art in web content extraction had changed over the past few years, and came upon a conference paper from last September, Learning Web Content Extraction with DOM Features, that seems interesting, relevant, and performant. There's also code: see learnhtml. Is there any interest in implementing its feature set within dragnet, and evaluating model performance with such features? This could be related to updates proposed in Issue #85.
The text was updated successfully, but these errors were encountered:
I was doing a quick lit review to see if/how the state-of-the-art in web content extraction had changed over the past few years, and came upon a conference paper from last September, Learning Web Content Extraction with DOM Features, that seems interesting, relevant, and performant. There's also code: see learnhtml. Is there any interest in implementing its feature set within dragnet, and evaluating model performance with such features? This could be related to updates proposed in Issue #85.
The text was updated successfully, but these errors were encountered: