GitHub - gauthier-schweitzer/PySpark-ENSAE: Full ML pipeline using PySpark : Preprocessing, Data Analysis, Prediction

Distributed Computing with Spark
Description: Full ML pipeline using PySpark : Preprocessing, Data Analysis, Prediction
Language: PySpark
Team: Gauthier Schweitzer & Cyril Verluise
Date: May 2018
Data: Accident Data, French OpenData. Data can be accessed at the following link

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Section_1BuildDataset.ipynb		Section_1BuildDataset.ipynb
Section_2DataExploration.ipynb		Section_2DataExploration.ipynb
Section_3ProcessingPrediction.ipynb		Section_3ProcessingPrediction.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

gauthier-schweitzer/PySpark-ENSAE

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages