Skip to content

Latest commit

 

History

History
69 lines (51 loc) · 3.35 KB

File metadata and controls

69 lines (51 loc) · 3.35 KB

Where to book 🎾 court in Raincouver?

Where can I book a freaking tennis court in Raincouver? -- Tennis Buddha

This project is to answer the universal question in one place. To rephrase it, let's build a webpage to display the court vacancies in Vancouver.

Current supported venues are

TODO

  • Parse booking page from Bnb - BTC
  • Parse booking page from Coq - TTC
  • Parse booking page from Surrey - TTC
  • Parse booking page from Langley - TTC
  • Parse booking page from Rmd - Hub
  • Parse booking page from Van - UBC
  • Parse booking page from North Van
  • Define a data storage format (JSON)
  • Build a HTML page to load data and render (mobile friendly)
  • Deploy somewhere and get HTML exposed
  • Add a background job to update data every five minutes
  • Error handling
  • Enable network retry
  • Collapsed views by default
  • Build a query interface to ask for vacancies on specific dates

Implementation notes

High level

  • bin/run is the script to scrape venue websites and store data into runner-data.json. It's configured to run every 5 minutes by Github Action
  • index.html is the static page that loads and renders the data. It's hosted by Github Pages

Details

  • Scraping strategy: Prefer simple data endpoint request over login and page parsing. So, try figuring out data endpoint firstly, otherwise use mechanize to parse page, otherwise employ Selenium Web driver to enable JavaScript simulation.
  • Data is actually stored in runner-data.js instead of runner-data.json to avoid CORS check, which requires JSON file to be loaded from a server.
  • Data is mostly massaged in back end and served directly to front end.
  • Data is sorted by (date, start_time, end_time, court_info)
  • mechanize is the main scraping framework used.

Others

  • Secrets are managed by direnv.
  • Use record: :new_episodes to record new VCR requests

How to add a new venue

  1. Add a new entry to venues.json
  2. Add new scraper in lib/new_venue_scraper.rb with test test/new_venue_scraper_test.rb

Known Issues

Github Actions

The shortest interval you can run scheduled workflows with Github Actions is once every 5 minutes. But there is no guarantee on the frequency. In practice, I noticed it's around 10 minutes on average.