Skip to content

BitTigerKaggle/Pikachu

 
 

Repository files navigation

Pikachu

Description

Our goal is to build a recipe crawler and search system that discover interesting facts from recipes and provide optimized search results. The system consist of two major components, including web crawler, search.

Plan

Based on our experiences on web development and data science, as well as the descriptions mentioned above, we take Feb, 2016 as the 1st stage with the primary goal of prototyping our application following the development guild lines mentioned below. Here's the tentative timeline.

  • [2016/02/08 - 2016/02/12] Project Selection, Plan Discussion, Proposal Draft Writing, Resource Discovery
  • [2016/02/13 - 2016/03/07] System Design, Project Implementation
    • Web Crawler
    • Search
    • Exploratory Analyzer / Recommender
  • [2016/03/08 - 2016/03/15] Document Writing, User Manual Writing and Video Presentation Making

Project management

Pikachu@Trello

Development Guild Lines

  • Modularity. Following the principle "loose coupling and high cohesion", each module should be standalone.
  • Minimalism. Each module should be kept short, simple, and concise. Every piece of code should be transparent upon first reading.
  • Easy extensibility. New modules (as new classes and functions) are should be simply add, and existing modules should be extended easily.

Language & Framework & Tech Stack

  • Javascript: Node.js, Express.js, AngularJS
  • Database: MongoDB
  • Cloud Platform: Cloud Foundry

System Diagram

System Architecture

Resource

BitTiger Project: AppStore - Website

Web Crawler

MEAN Stack

MEAN is an acronym for MongoDB, Express.js , Angular.js and Node.js

A very good online course about MEAN stack on edX:

MongoDB: MongoDB is an open-source, document database (NoSQL) designed for ease of development and scaling.

Express.js: Fast, unopinionated, minimalist web framework for Node.js.

Angular.js: Angular is a development platform for building mobile and desktop web applications.

Node.js: Node.js is a JavaScript runtime built on Chrome's V8 JavaScript engine.

Hybrid Mobile App

Ionic: Ionic is an advanced HTML5 hybrid mobile app framework, it makes it incredibly easy to build beautiful and interactive mobile apps using HTML5 and AngularJS.

Search

ElasticSearch: Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.

Miscellaneous

FAQ

MongoDB & ElasticSearch For Full Text Search In Chinese

Anti Anti-scraping Strategy

Owner

@Pikachu

About

Yummy Recipe Crawler and Search

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 78.2%
  • CSS 13.7%
  • Objective-C 4.6%
  • C++ 2.6%
  • Java 0.5%
  • C# 0.2%
  • Other 0.2%