Skip to content

othalan/comicsrss.com

 
 

Repository files navigation

comicsrss.com

ComicsRSS

Source code for the site generator and rss feed generator for comicsrss.com.

Also, all of the site's content is in this repository, as it is hosted by GitHub Pages.

Support Me

If you'd like to help keep this site going, you can send me a few bucks using Patreon. I'd really appreciate it!

Technical Details

I have received many requests to add more comic series to the site. However, my time is limited. So if you want to help out, you can make a scraper!

To be able to add comic series to Comics RSS, it is helpful to understand the basics of what is going on.

Comics RSS has two types of parts: scrapers, and the site generator. Each scraper parses a different comic website, and writes a temporary file to the disk. The site generator reads the temporary JSON files, generates and writes static HTML and RSS files to the disk.

How scrapers work

The scrapers make https requests to a website (for example, gocomics.com), parse the responses, and write temporary JSON files to the disk.

On a site like gocomics.com, a scraper has to first make a request to get the list of comic series. (For example, gocomics.com/comics/a-to-z)

Then, for each comic series it finds, it needs to look up the latest day's comic strip. If it has not seen that day's comic strip, then it saves that comic strip, and looks up the previous day's comic strip. When it finds a comic strip that it has seen before, it will continue to the next comic series, until it finishes the website.

Finally, it writes the lists of comic series with their list of strips to a temporary JSON file on the hard drive.

How the site generator works

The site generator finds the temporary JSON files made by the scrapers. Each file has an array of seriesObjects. These arrays are concatenated into one big list of comic series, each with their list of comic strips. The generator uses templates to generate an index.html file, and rss/{comic}.rss files.

When these updated/new files are committed and pushed to this repository, they get hosted on gh-pages, which is how you view the site today.

Install for yourself

  1. Fork the repository
  2. Create a GitHub Deploy Key, add it to GitHub, and CircleCI
  3. Change .circleci/config.yml from my username, email, and key fingerprint to your username, email, and key fingerprint
  4. I think that's it? Make a PR if you attempt the above steps and I missed something!

Scraper API

The scraper API changed recently, and I haven't updated the documentation. However, it is simpler than before. Unfortunately, there are fewer docs than before.

See the code for arcamax to see that it's not very difficult to make a scraper for a large website, with just a little bit of string manipulation.

License

MIT

About

RSS feeds for comics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 98.3%
  • JavaScript 1.5%
  • Other 0.2%