Realtime social media data analytics with Apache Spark, Python, Kafka, Pandas, etc
Project uses Apache Spark functionalities (SparkSQL, Spark Streaming, MLib) to build machine learning models (Batch Processing-Slow) and then apply the model with (Spark Streaming-Fast) to predict new output.
We utilize historical and streaming data from different social media networks through network provided APIs.
- Twitter - https://apps.twitter.com/
- MeetUp - https://secure.meetup.com/meetup_api
- GitHub - [Guides : https://developer.github.com/v3/, API Calls: https://api.github.com/, API Keys : https://github.com/settings/developers, Tokens : https://github.com/settings/tokens
- DataBricks Community Edition
- Anaconda Python 2.7 Distro (Pandas, etc)
- Apache Spark (SparkSQL, Spark Streaming, Spark MLib, GraphX)
- Apache Kafka (Realtime distributed message passing tool)
- Persistent Data Store (RDMBS:MySQL, Columnar:CSV, Casandra, Document:MongoDB)
pip install Twitter
pip install PyGithub
pip install
Discovering what everyone is whispering about on social media. Fantastic tool to discover what's really trending across social media and hot topics discovery.
- Delivering REALTIME news, events, alerts tailored to users needs and interest.
- Search Twitter, Facebook, Google+ for keywords.
- Batch process with Spark
- Present on web pages, send alerts and push to users.