Import the DoHaskell content #33

beerendlauwers · 2016-08-15T09:35:20Z

@mitchellwrosen has kindly provided me with a database dump of his dohaskell.com website. He has also generated a bunch of YAML metadata for the content, available here: https://github.com/mitchellwrosen/dohaskell/blob/master/resources-dump.yaml

Scraper source is here: https://gist.github.com/beerendlauwers/102c833c7a98babede60fe05e4dc789b (for my own reference).

There are still a few categories that have to be added to accomodate most of the DoHaskell content, notably #10, #4, #2 and a "Article" category for some of the more generic ones (Blog posts, Medium articles, etc).

beerendlauwers · 2017-01-16T07:33:13Z

WIP here: https://github.com/mitchellwrosen/dohaskell/blob/666eaa575a4e63ef2a9e41c66a20ad3f5950d989/resources-dump-wip.yaml

duplode · 2017-02-28T05:44:47Z

I wrote a rough proof of concept converter that reads the YAML dump, generates Hakyll files in (an approximation of) the HaskAnything format for some of the entry types in it, and spits out the entries it didn't handle to a new YAML file. I don't know exactly how useful that will turn out to be -- things like cleaning up the 334 tags in the YAML dump and adding summaries to the entries will require manual work anyway -- but with some more polish a script like this might save a fair bit of time. By the way, the linked Gist also contains a modified version of the YAML dump with the necessary fixes so that Data.Yaml is able to parse it.

beerendlauwers · 2017-02-28T10:11:40Z

Awesome, thanks! This already brings us a lot closer :)

duplode · 2017-03-01T00:05:30Z

@beerendlauwers You're welcome. I have improved the converter a bit -- now it can deduplicate tags, keep the output fields sorted, and recognise more content types and library tags -- and tweaked a few titles in the YAML dump to avoid file name collisions. I have put that in a proper repository, along with sample output and the full lists of DoHaskell tags and types. If you find the output is looking good, I can begin shaping it into a pull request for HaskAnything.

beerendlauwers · 2017-03-03T09:43:11Z

Thanks a bunch!

To answer the questions you pose in the README.md:

Are functional pearls best filled under "articles" or "papers"?

Good question. I think we'll have to look at this on a case-by-case basis. We can tag them with "Functional Pearl" in any case.

How many new content types are necessary to cover the still unprocessed DoHaskell content types?

I'll have to go over the remaining stuff in dhTypes.txt. Will get back to you on this.

Do we need separate tags for reflection (the concept) and reflection (the Edward Kmett library)? What about "lenses" vesus "lens"?

No, we have a metadata field for tags and libraries, already. The concept can be added as a tag, the library as a library. The reason for splitting them up is that when we start with linking up all the Hackage libraries, we can identify which content refers to them.

How to handle capitalisation, specially relative to the existing HaskAnything content?

Capitalisation of?

duplode · 2017-03-04T02:13:58Z

Good question. I think we'll have to look at this on a case-by-case basis. We can tag them with "Functional Pearl" in any case.

Yup. That's in line with why I slipped in a dohaskell-type field -- with that, even if they are added under a single category at first we can still identify them and review the classification with relative ease.

No, we have a metadata field for tags and libraries, already. The concept can be added as a tag, the library as a library.

In that case, I will figure out a sensible way to add "lens" and "reflection" library labels to the entries that need them. In hindsight, I didn't phrase that very clearly. When I said "tags", I really meant "values in either the library field or the tags one", as in my mental model the library labels are just tags that go to a separate field because they immediately refer to a library. (My mental model might be wrong, though, so please correct me if need be!)

Capitalisation of?

Oops, I dropped a part of the sentence :) I meant capitalisation of tags. Unlike HaskAnything, DoHaskell had mostly lowercase tags, and so we have e.g. "cryptography" in the YAML dump versus "Cryptography" in the HaskAnything site. In any case, that's a very minor issue, as it would be simple to solve any such inconsistencies even after the import is done.

beerendlauwers · 2017-03-04T12:04:09Z

When I said "tags", I really meant "values in either the library field or the tags one", as in my mental model the library labels are just tags that go to a separate field because they immediately refer to a library. (My mental model might be wrong, though, so please correct me if need be!)

They're processed with the Tags datatype in Hakyll, but apart from that they're kept different: they are different facets (see http://haskanything.com/filter.html) and, of course, libraries will be linked to the actual Hackage packages.

capitalisation of tags

Yeah, we can do some naïve capitalisation ("if it's one word, capitalize the first letter"), but we'll have to go through those manually, probably.

beerendlauwers added the newcomer label Aug 15, 2016

beerendlauwers added the type: content label Feb 27, 2017

beerendlauwers mentioned this issue Apr 19, 2017

website down? mitchellwrosen/dohaskell#5

Closed

beerendlauwers removed help wanted labels Oct 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import the DoHaskell content #33

Import the DoHaskell content #33

beerendlauwers commented Aug 15, 2016

beerendlauwers commented Jan 16, 2017

duplode commented Feb 28, 2017 •

edited

Loading

beerendlauwers commented Feb 28, 2017

duplode commented Mar 1, 2017 •

edited

Loading

beerendlauwers commented Mar 3, 2017

duplode commented Mar 4, 2017

beerendlauwers commented Mar 4, 2017 •

edited

Loading

Import the DoHaskell content #33

Import the DoHaskell content #33

Comments

beerendlauwers commented Aug 15, 2016

beerendlauwers commented Jan 16, 2017

duplode commented Feb 28, 2017 • edited Loading

beerendlauwers commented Feb 28, 2017

duplode commented Mar 1, 2017 • edited Loading

beerendlauwers commented Mar 3, 2017

duplode commented Mar 4, 2017

beerendlauwers commented Mar 4, 2017 • edited Loading

duplode commented Feb 28, 2017 •

edited

Loading

duplode commented Mar 1, 2017 •

edited

Loading

beerendlauwers commented Mar 4, 2017 •

edited

Loading