Skip to content

Latest commit

 

History

History
executable file
·
137 lines (108 loc) · 6.26 KB

README.md

File metadata and controls

executable file
·
137 lines (108 loc) · 6.26 KB

data-science-tutorials

This repository contains jupyter-notebooks to accompany the tutorials for our data science lectures. The following topics are covered (each within a separate folder).

  1. Dataset Visualization (Boston Housing minus the linear regression; also other datasets like Flower, MNIST-digits, 20newsgroups) working/visualizing one dataset (incl. Matplotlib; .describe attribute; box-plot, min-max-normilization; boston housing; linear reg c/o dsP)
  2. Clustering
  3. Association Rule Learning (dataset yet to be determined; preferably from scikit learn)
  4. Regression (linear regression from Boston Housing and Car Prices)
  5. Bayes Learning (for spam filtering/text classification)
  6. Classification with Decision Trees (start with small 5-line dataset)
  7. Neural Networks (use keras.io to build a neural network for MNIST-digit classification) keras (for MNIST class); OPT gensim (for word2vec; pick dataset from tensorflow); then auto-encoder for representatino learning
  8. OPTIONAL MapReduce

Packages

See our python-tutorials on instructions how to set this up on your machine.

required

optional

  • Pandas; [documentation] also as pdf

Table of contents

  • 0-Intro

    • Scikit-learn-overview.ipynb
    • Web Mining Project .ipynb
  • 1-Datasets_Visualization_and_preprocessing

    • 1-IRIS.ipynb
    • 2-Boston_house_dataset.ipynb
    • 3-MNIST.ipynb
    • 4-UCI_CAR.ipynb
    • 5-20newsgroups.ipynb
    • 6-KDD_cup_2000_data_set.ipynb
    • Crawling_twitter_with_python.ipynb
    • MDS_projection.ipynb (IRIS)
    • PCA_projection.ipynb (IRIS)
    • scikit-learn-overview-and-preprocessing.ipynb (IRIS)
    • VA-InformationVisualisation-with-JavaScript-and-3DJs.ipynb
    • TODO try visualization with Orange (available through the conda-forge channel)
  • 2-Clustering

    • Clustering_overview.ipynb (IRIS) (MNIST)
    • Tutorial_clustering_for_outlier_detection_3D.ipynb (Kddcup 1999)
    • Tutorial_clustering_for_outlier_detection.ipynb (Kddcup 1999)
  • 3-Association-Rules

    • Apriori_asaini.ipynb (MBE_dataset)
    • Apriori.ipynb (Boston house)
    • Apriori_server.ipynb (Mango_dataset)
    • Assignment_Association_rule_learning.ipynb
    • Tutorial_association_rule_learning_shopping_basket.ipynb (KDDcup 2000)
  • 4-Linear_regression_and_logistic_regression

    • Assignment_Linear_Regression.ipynb
    • Assignment_Logistic_regression.ipynb (UCI_car)
    • Boston_house_Linear_Regression.ipynb (Boston house)
    • Linear_regression_diabetes_dataset.ipynb
    • Linear-Regression.ipynb (Boston house)
    • Logistic_regression.ipynb (IRIS)
    • Small_scale_linear_regression.ipynb (KDDcup)
    • Supervised_Learning_with_Linear_Models.ipynb (Boston house)
  • 5-KNN_classification

    • KNN_classification.ipynb (IRIS)
    • Metrics.ipynb (IRIS)
  • 6-Bayes-Learning

  • 7-Decision-Trees.ipynb (UCI_car)

  • 8-Neural-Networks

    • keras-mnist.ipynb (MNIST)
    • Simple-NN.ipynb (make_moons)
    • Stacked-Denoising-Autoencoders.ipynb
    • INFO Software Comparison
      • keras.io (high-level, running on top of TensorFlow (default) or Theano) c/o Francois Chollet (written in Python)
      • Theano c/o Universite de Montreal (written in Python; tightly integrated with NumPy)
      • TensorFlow c/o Google Brain (written in Python/C++)
  • 9-SVM

    • Assignment_SVM_for_OCR.ipynb (MNIST)
    • Support_Vector_Machines.ipynb (IRIS)
  • A-Advanced_modules

    • NLP-with-NLTK-Short-Intro.ipynb
  • B-Scripts

Links

Cheat Sheets

Other Collections

Module Specific

(should be listed at the module)