cleanData

Getting and Cleaning Data course project

Read Me for R script "run_analysis.R"

Summary

The sript produces a tidy data set "averages.txt" from sensor measurements collected from the accelerometers from the Samsung Galaxy S smartphone. A full description is available at the site where the data was obtained: http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

The data source (including its own README):

https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

Specifically, the resulting "averages.txt" includes the averages of the mean and standard deviation variables of each of the sensor measurements per each of the activities (such as walking, standing, sitting) and for each of the subjects.

Specifics

The necessary source files (to be copied to the working directory from the above URL):

"activity_labels.txt"
"features.txt"
"subject_test.txt"
"subject_train.txt"
"X_test.txt"
"X_train.txt"
"y_test.txt"
"y_train.txt"

The necessary libraries:

dplyr
the basic R packages

Script synopsis

Merges the training and the test sets to create one data set.
Extracts only the measurements on the mean and standard deviation for each measurement. See Note [1] below.
Uses descriptive activity names to name the activities in the data set. See Note [2] below.
Appropriately labels the data set with descriptive variable names. See Note [3] below.
From the data set in step 4, creates a second, independent tidy data set "averages.txt" with the average of each variable for each activity and each subject.

This section credit to Instructions at https://class.coursera.org/getdata-014/human_grading/view/courses/973501/assessments/3/submissions

Notes

[1] The present code forming the variables 'meanTable' and 'stdTable' seeks to select each column from the combined (train and test) table whose name has a match with substrings "mean" or "std". By reviewing the files features_info.txt and features.txt from the source data package, the user can modify the code in the lines of this R script to select the specific mean and standard deviation (or any other) variable variants of interest for a given purpose.

[2] For the activity names, see the first column "activity" in "averages.txt", read as read.table("averages.txt", header=TRUE), with naming matching with "activity_labels.txt" from the source data package.

[3] Besides the first two columns as explained in note [2] above, the remaining column names in "averages.txt" match corresponding the original variable names per the features_info.txt and features.txt from the source data package -- however, in the "averages.txt" output file, these column values are the averages of these variables for the activity and subject identified by the first two columns for a given row.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cleanData

Getting and Cleaning Data course project

Read Me for R script "run_analysis.R"

Summary

Specifics

Script synopsis

Notes

About

Releases

Packages

Languages

marksandstrom/cleanData

Folders and files

Latest commit

History

Repository files navigation

cleanData

Getting and Cleaning Data course project

Read Me for R script "run_analysis.R"

Summary

Specifics

Script synopsis

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages