Skip to content

Latest commit

 

History

History
87 lines (73 loc) · 6.51 KB

ws17.md

File metadata and controls

87 lines (73 loc) · 6.51 KB

Data Science Course 2017 September

0 - Intro

0.1 Anaconda

Install Anaconda

Download and install Anaconda by choosing the installer corresponding to your OS. Select the Python 3.x Version.

After the installation is finished, you can check your installation by typing conda list in your terminal. If Anaconda is successfully installed, this command will display a list of the installed packages.

You may need to set the environment variables by running
export PATH="~/anaconda3/bin:$PATH"
in your terminal and adding
export PATH=~/anaconda3/bin:$PATH
to your .bashrc (to edit it e.g. with nano, type nano ~/.bashrc in your terminal).

Environments

In Python, it is good practice to work with environments, in order to keep the dependencies clean and the required modules separated. To create a new environment for the Data Science Course, type
conda create -n datsci python=3 scikit-learn jupyter matplotlib pandas
in your terminal.

This will create a new environment named datsci and also install some of the packages required during the course (scikit-learn, jupyter, ...). Those packages have further dependencies, e.g. on numpy (which we will need anyhow) which are installed during the process as well.
Finally, you need to activate the environment with:
activate datsci (Windows)
source activate datsci (OSX & Linux)
The command prompt indicates the activated environment by prepending (datsci). Further information about environments can be found in the documentation.

Package management

To install further packages, use conda install. For example, to install the scipy package, simply type conda install scipy.
By default, packages will be installed in the currently active environment, so make sure to activate your desired environment before. Alternatively, you can explicitly specify the environment by typing conda install --name datsci scipy (assuming you want to install into the environment datsci).
In case you stumble across a missing module in one of our notebooks, you can install it via this procedure.

IMPORTANT: The keras module does not seem to be available on the official conda channels, therefore install it by typing
pip install keras
in your terminal (make sure, your desired environment is active). Further information about package management can be found in the documentation.

0.2 Git

Git is a versioning system, which we use in this course merely to distribute the course material. All the (probably) relevant commands are listed below, but you may also want to take a look at the Quick Guide or the full documentation.

Install Git

apt-get install git (Linux)
download and install (Windows)
download and install (OS X)

Clone Repository

Windows: Create a terminal with git support via right click on the mouse or the start menu entry (Github -> Git Shell)
To clone a repository, run git clone. During the Data Science Course, we will use two repositories, so you have to clone both of them by typing:
git clone https://github.com/zieglerk/python-tutorials.git
git clone https://github.com/zieglerk/data-science-tutorials.git
in your terminal. These commands will clone the repositories into your current folder and create the folders python-tutorials and data-science-tutorials. You might need to register a Github account in order to be able to clone the repositories.

Getting Updates

In case a notebook is updated during the course, you can get the latest version by running git pull in the respective directory (either python-tutorials or data-science-tutorials).
If you already edited a notebook, you might be required to either commit or stash your changes. When you commit your changes and pull the latest version, Git will try to merge the changes automatically.

Run the notebooks

After you have completed all the previous steps you can run the notebooks by navigating to one of the repositories (e.g. data-science-tutorials) and typing
jupyter notebook in your terminal. This will open the notebooks in a browser. In order to execute a cell in a notebook hit Shift+Enter.

Local Jupytherhub

For those who cannot install the required tools, we provide a ready-to-use web-environment. You find the link in the Ilias course. In order to use the notebooks, you need to download them from Github and upload them to the Jupyterhub. Github offers an option to download the whole repository as zip-file.

0.2 Python Programming Basics DSiP-2-Python-Programing-Basics

0.3 Numpy DSiP-3-NumPy

0.4 Matplotlib DSiP-5-Matplotlib

1 - Datasets, Preprocessing, Visualization

Datasets

3 - Classification

4 - Decision Trees

5 - Neural Networks

5.1. Simple NN

5.2. MNIST with Keras

6 - Clustering

7 Association Rules