INFO 490: Foundations of Data Science uses an project-based approach to indoctrinate students into the tools and technologies necessary for working with large data.
Upon completion of this course, students will be expected to understand the basic concepts of data science. Students will learn how to work at a Unix prompt, how to use the Python programming language to process, visualize, and persist large data sets, and how to use database technologies including SQL.
There are no pre-requisites for this course, except for an interest in learning the basic skills necessary for being a data scientist and access to a computer to participate in the course lectures, and to complete the required course assignments.
Note: At present, students are required to use their own computer system to perform many of the requirements for this course. We hope to eventually enable at least soem students to use a cloud computing approach to interacting with the course material. Until that time, however, we recommend the following process to prepare your computer for this course:
-
We have made a Docker image for this course (this is similar to what we have done previously when we have taught this course). To use this course image, you need to download and install Docker Machine. To use Docker, which will provide a Unix shell with all of our required course software for nearly all modern computing platforms, you should go to the Docker website and click the Get Started with Docker button. This will give you instructions for downloading and installing Docker machine on your computer. Note that this will only work for ‘fairly’ modern computers that support hardware virtualization. On some computers, hardware virtualization must be enabled in your computer BIOS. If you have questions about this process, including BIOS changes, please consult the course assistants or instructor. Once the course opens, you will have more details instructions on how to pull our course Docker image, start the image to have a running container, and how to download and install the course github repository in your running Docker container.
-
If your computer is unable to support hardware virtualization, you will (most likely) be unable to run Docker. Your options depend on the type of computer you are using:
2a. If you are running Mac OSX or Linux you can instead use a free Python package manager to install required software (more information will be forthcoming). You can download Anaconda, you need the version that supports Python 3.4, from Continuum Analytics. In this case, you will simply use a BASH shell on your computer to learn the Unix commands. If possible, however, you should still use Docker to (a) have an isolated environment where you can’t accidentally delete or change important system files and (b) learn about virtualization technologies.
2b. If you have an older Windows laptop, you have several, unsatisfactory options. This is because Windows is not Unix. We can offer only limited support for these options since they are beyond the context of this course.
-
You can install and use Cygwin. It will create a Unix like environment, but there will be differences.
-
You could use (potentially free) cloud resources from Google, Amazon, Microsoft, Backspace, Cloudier, etc., to complete the course material.
Once you have downloaded Docker machine, you will need to pull our course Docker image in order to have an effective working environment. Instructions for doing this are included in Lesson 2 for Week 1 of the course (on virtualization and Dockers). Once your have the course Docker container running, you will need to use git to clone the course repository. At this point you will have a full working course setup. You will also be instructed how to do this in Lesson 2 of Week 1.
There are no required textbooks for this course. Instead, we will utilize internet accessible websites and documentation as supplemental material to the lesson content.
Each week will provide learning objectives and an outline of the activities for that week with a list of all deadlines and corresponding point values for assignments.
Each week there will be at least one video that will offer a broader context for the new week, explain key concepts, and demonstrate important tasks. To view them, you will need to login to the Illinois Mediaspace (links are embedded in the relevant weekly overview (and occasionally a lesson). By viewing all videos for a week, you will be given twenty points. In case you are wondering, Illinois Mediaspace tracks the viewing of course videos.
Readings will consist of articles and excerpts from books and Web sites, and in some cases IPython Notebooks that can be viewed statically on the Github website, or (via the preferred approach) by interacting with them via the course JupyterHUb server. You will be required to read and be familiar with the content of these documents. Readings are contextualized as part of the weekly lesson content and are located in the "Readings" section of each lesson.
Lessons will expand upon, or clarify key concepts in the reading assignments or supplement or add to the reading. All lessons for a given week must be completed by 6:00 PM Central on Thursday of that week.
Each week will contain three lesson modules (except for the last week, which will contain only one). A lesson module will will include a Moodle quiz designed to be taken after completing the readings and carefully reviewing the lesson material. Lesson quizzes will allow two attempts, to ensure students have mastered the relevant material before advancing to the next lesson module. The lessons assessments must all be completed by 6:00 PM Central on Thursday of that week.
Every week but the first and last will contain an assignment that will involve one or more computational tasks related to the focus for that given week.
Instructions for submitting assignments will be forthcoming. To receive full credit from instructor grading, your assignment must be submitted prior to the deadline. There will be a 24-hour grace period, in which an assignment can be submitted, albeit with an automatic 50% reduction in the maximum possible score. After this grace period, no assignments will be accepted. The full credit assignment deadline is 6:00 PM Central on Saturday of the relevant week.
Weekly assignments will be reviewed by your course peers, as well as automatic instructor grading. 40% of the grade for each weekly assignment submission will derive from peer review, 60% from instructor review. You will receive 50 pts each week for simply viewing and grading your peers' assignments. Note that you can (and should) still grade your peers even if you miss an assignment submission. Peer review of an assignment must be completed by 6:00 PM Central on Tuesday of the following week (i.e., you submit your assignment on a Saturday and then must peer assess other students assignments by the following Tuesday). You will be assigned assignments to grade approximately one hour after the late assignment deadline, thus around 7:00 pm Sunday evening of each week.
In addition to the lesson quizzes, each week will conclude with a weekly quiz. The weekly quiz is designed to test your overall mastery of the content for each given week. Unlike the lesson quizzes, weekly quizzes will be timed and will not allow multiple attempts. The quiz must be completed by 6:00 PM Central on Friday of that week.
This course is project-based in its use of assignments that build progressively on content mastery, application, and peer review; there are no exams in this course.
While you are still strongly encouraged to complete all activities in the course, we will drop your three lowest weekly grades from the second to the fourteenth weeks. Since later topics build on earlier topics, however, it is in your best interest to still complete all readings, even if after the relevant deadline.
Assignment | Points | Occurrences | Total Points |
---|---|---|---|
Pre-Class Activity: Introduce Yourself | 60 | 1 | 60 |
Orientation Quiz | 70 | 1 | 70 |
Lesson Assessments | 60 | 14 (Week 15 is only 20 points) | 860 |
Weekly Quizzes | 70 | 14 (No quiz in Week 15) | 980 |
Weekly Videos | 20 | 16 (including the Orientation Week video) | 320 |
Assignments (Weeks 2-14) | 150 | 13 | 1950 |
Total | 4240 |
Note, after the lowest three weekly scores are dropped from weeks 2-14, the maximum total score for the class is 3340.
Final grades will be graded on a curve, if necessary. The letter grade cutoffs will be set at the traditional 90%, 80%, and 70% limits, and plus/minus will be added if you are within two points of the traditional cutoffs (so 100–98 is an A+ and 90–92 is an A-).
Percentage | Letter Grade |
---|---|
98-100 | A+ |
92-98 | A |
90-92 | A- |
88-90 | B+ |
82-88 | B |
80-82 | B- |
78-80 | C+ |
72-78 | C |
70-72 | C- |
68-70 | D+ |
62-68 | D |
60-62 | D- |
Below 60 | F |