You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
wget https://github.com/JeremySkinner/WebMatrix.Data.StronglyTyped/blob/master/License.txt | unfluff
will yield no content: {"title":"","softTitle":"","date":null,"author":[],"publisher":null,"copyright":null,"lang":null,"tags":[],"image":null,"videos":[],"links":[],"text":""}
because the tool fails to detect plain text.
This makes it untrustworthy for parsing webpages extracted via GET requests made from tools.
I can do a workaround in my code and search for any HTML tags in the text before calling unfluff and only call it if I find any, otherwise assume it's plain text already, but it would be nice if the tool could do that automatically.
The text was updated successfully, but these errors were encountered:
Some webpages (such as https://github.com/JeremySkinner/WebMatrix.Data.StronglyTyped/blob/master/License.txt ) detect whether they're loaded from a browser or from other tools and serve different content type. The one in the example will serve HTML when loaded from the browser, but plain text when loaded via wget.
Thus:
wget https://github.com/JeremySkinner/WebMatrix.Data.StronglyTyped/blob/master/License.txt | unfluff
will yield no content:
{"title":"","softTitle":"","date":null,"author":[],"publisher":null,"copyright":null,"lang":null,"tags":[],"image":null,"videos":[],"links":[],"text":""}
because the tool fails to detect plain text.
This makes it untrustworthy for parsing webpages extracted via GET requests made from tools.
I can do a workaround in my code and search for any HTML tags in the text before calling unfluff and only call it if I find any, otherwise assume it's plain text already, but it would be nice if the tool could do that automatically.
The text was updated successfully, but these errors were encountered: