-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import the DoHaskell content #33
Comments
I wrote a rough proof of concept converter that reads the YAML dump, generates Hakyll files in (an approximation of) the HaskAnything format for some of the entry types in it, and spits out the entries it didn't handle to a new YAML file. I don't know exactly how useful that will turn out to be -- things like cleaning up the 334 tags in the YAML dump and adding summaries to the entries will require manual work anyway -- but with some more polish a script like this might save a fair bit of time. By the way, the linked Gist also contains a modified version of the YAML dump with the necessary fixes so that |
Awesome, thanks! This already brings us a lot closer :) |
@beerendlauwers You're welcome. I have improved the converter a bit -- now it can deduplicate tags, keep the output fields sorted, and recognise more content types and library tags -- and tweaked a few titles in the YAML dump to avoid file name collisions. I have put that in a proper repository, along with sample output and the full lists of DoHaskell tags and types. If you find the output is looking good, I can begin shaping it into a pull request for HaskAnything. |
Thanks a bunch! To answer the questions you pose in the README.md:
Good question. I think we'll have to look at this on a case-by-case basis. We can tag them with "Functional Pearl" in any case.
I'll have to go over the remaining stuff in dhTypes.txt. Will get back to you on this.
No, we have a metadata field for tags and libraries, already. The concept can be added as a tag, the library as a library. The reason for splitting them up is that when we start with linking up all the Hackage libraries, we can identify which content refers to them.
Capitalisation of? |
Yup. That's in line with why I slipped in a
In that case, I will figure out a sensible way to add "lens" and "reflection" library labels to the entries that need them. In hindsight, I didn't phrase that very clearly. When I said "tags", I really meant "values in either the
Oops, I dropped a part of the sentence :) I meant capitalisation of tags. Unlike HaskAnything, DoHaskell had mostly lowercase tags, and so we have e.g. "cryptography" in the YAML dump versus "Cryptography" in the HaskAnything site. In any case, that's a very minor issue, as it would be simple to solve any such inconsistencies even after the import is done. |
They're processed with the
Yeah, we can do some naïve capitalisation ("if it's one word, capitalize the first letter"), but we'll have to go through those manually, probably. |
@mitchellwrosen has kindly provided me with a database dump of his dohaskell.com website. He has also generated a bunch of YAML metadata for the content, available here: https://github.com/mitchellwrosen/dohaskell/blob/master/resources-dump.yaml
Scraper source is here: https://gist.github.com/beerendlauwers/102c833c7a98babede60fe05e4dc789b (for my own reference).
There are still a few categories that have to be added to accomodate most of the DoHaskell content, notably #10, #4, #2 and a "Article" category for some of the more generic ones (Blog posts, Medium articles, etc).
The text was updated successfully, but these errors were encountered: