generated from rstudio/bookdown-demo
-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy path02-overview.Rmd
61 lines (47 loc) · 2.2 KB
/
02-overview.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
output:
pdf_document: default
html_document: default
---
# Overview
What is data science used for?
All of data science uses data for three things: Descriptive, in which we just report, visualize, display the data; Explanatory, in which we try to understand how the different variables affect one another; and Predictive; in which the aim is to predict variables.
Further, it is helpful to understand that data science is made of four main fields: Software Development, Programming, Statistics, and DeveOps.
The technical tools you need learn to be able to work as a data scientist are:
1. __Project management & Team Collaboration and Communication technologies.__
1. Slack
2. Jira (or Trello, Shortcuts, etc.)
3. GitHub
2. __DevOps__
1. Basics of command line or Linux operating system
2. Git Bash
3. Git Hub
4.
5. Cloud Computing
6. Cloud Storage
7. Docker
8. Hosting Docker Containers online.
3. __Statistical Computational Language R (in our case, but in general could be Python, Julia, etc.)__
1. R
2. RStudio
3. RMarkdown
4. __Data Bases__
1. SQL
2. SQLite
5. __Applications and Interactive Dashboards in R.__
1. Flexdashboard
2. Shiny
3. Shinydashboards
The things you will need to learn how to do as a data scientist are:
1. __Practical Step 0:__ Practically looking and understand the raw data.
1. How to look at raw data.
2. How to evaluate the raw data.
3. How to understand the raw data.
2. __Theoretical Step 0:__ Knowing what format the data should be for data science.
0. This simply means the format the data needs to be in to correctly visualize it, create tables of it's variables, and apply predictive and inferential (explanatory) modeling.
1. What format the raw data should be transformed into to visuals it.
2. What format the raw data should be transformed into to make tables to understand the data.
3. What format the raw data needs to be transformed into so that statistical, machine learning, predictive, and inferential models can be applied to the data.
4.
3. __Practical Step 1:__ How to transform the raw data using SQL and your Statistical Programming Language like R.
1.