Skip to content

Latest commit

 

History

History
485 lines (396 loc) · 64.7 KB

README_toc.md

File metadata and controls

485 lines (396 loc) · 64.7 KB

1 Array, Matrix, Dataframe

1.1 List

  1. Multi-dimensional Named Lists: rmd | r | pdf | html
    • Initiate Empty List. Named one and two dimensional lists. List of Dataframes.
    • Collapse named and unamed list to string and print input code.
    • r: deparse(substitute()) + vector(mode = "list", length = it_N) + names(list) <- paste0('e',seq()) + dimnames(ls2d)[[1]] <- paste0('r',seq()) + dimnames(ls2d)[[2]] <- paste0('c',seq())
    • tidyr: unnest()

1.2 Array

  1. Basic Arrays Operations in R: rmd | r | pdf | html
    • Generate N-dimensional array of NA values, label dimension elements.
    • Basic array operations in R, rep, head, tail, na, etc.
    • E notation.
    • Get N cuts from M points.
    • r: sum() + prod() + rep() + array(NA, dim=c(3, 3)) + array(NA, dim=c(3, 3, 3)) + dimnames(mn)[[3]] = paste0('k=', 0:4) + head() + tail() + na_if() + Re()
    • purrr: reduce()
  2. Generate Special Arrays: rmd | r | pdf | html
    • Generate equi-distance, special log spaced array.
    • Generate probability mass function with non-unique and non-sorted value and probability arrays.
    • Generate a set of integer sequences, with gaps in between, e.g., (1,2,3), (5), (10,11).
    • r: seq() + sort() + runif() + ceiling() + sample() + apply() + do.call()
    • stats: aggregate()
  3. String Operations: rmd | r | pdf | html
    • Split, concatenate, subset, replace, and substring strings.
    • Convert number to string without decimal and negative sign.
    • Concatenate numeric and string arrays as a single string.
    • Regular expression
    • r: paste0() + paste0(round(runif(3),3), collapse=',') + sub() + gsub() + grepl() + sprintf()
  4. Meshgrid Matrices, Arrays and Scalars: rmd | r | pdf | html
    • Meshgrid Matrices, Arrays and Scalars to form all combination dataframe.
    • tidyr: expand_grid() + expand.grid()

1.3 Matrix

  1. Matrix Basics: rmd | r | pdf | html
    • Generate and combine NA, fixed and random matrixes. Name columns and rows.
    • Sort all rows and all columns of a matrix.
    • Replace values outside min and max in matrix by NA values.
    • R: rep() + rbind() + matrix(NA) + matrix(NA_real_) + matrix(NA_integer_) + colnames() + rownames() + t(apply(mt, 1, sort)) + apply(mt, 2, sort) + colMeans + rowMeans + which()
  2. Linear Algebra Operations: rmd | r | pdf | html

1.4 Regular Expression, Date, etc.

  1. R String Regular Expression (Regex): rmd | r | pdf | html
    • Regular expression.
    • Find characters that that contain or not contain certain certain strings, numbers, and symbols.
    • r: grepl()

2 Manipulate and Summarize Dataframes

2.1 Variables in Dataframes

  1. Generate Tibble Dataframes from Matrix and List: rmd | r | pdf | html
    • Generate tibble data from two dimensional named lists, unlist for exporting.
    • Generate tibble dataframe, rename tibble variables, generate tibble row and column names.
    • Export tibble table to csv file with date and time stamp in file name.
    • Rename numeric sequential columns with string prefix and suffix.
    • base: Sys.time() + format() + sample(LETTERS, 5, replace = TRUE) + is.list
    • dplyr: as_tibble(mt) + rename_all(~c(ar_names)) + rename_at(vars(starts_with("xx")), funs(str_replace(., "yy", "yyyy")) + rename_at(vars(num_range('',ar_it)), funs(paste0(st,.))) + rowid_to_column() + row_number() + min_rank() + dense_rank() + mutate_if()
    • base: colnames + rownames
  2. Interact and Cut Variables to Generate Categorical Variables: rmd | r | pdf | html
    • Convert rowname to variable name.
    • Generate categorical variable from a continuous variable.
    • Convert numeric variables to factor variables, generate interaction variables (joint factors), and label factors with descriptive words.
    • Graph MPG and 1/4 Miles Time (qsec) from the mtcars dataset over joint shift-type (am) and engine-type (vs) categories.
    • r: cut(breaks = ar, values = ar, right = FALSE)
    • tibble: rownames_to_column()
    • forcats: as_factor() + fct_recode() + fct_cross()
  3. Randomly Draw Subsets of Rows from Matrix: rmd | r | pdf | html
    • Given matrix, randomly sample rows, or select if random value is below threshold.
    • r: rnorm() + sample() + df[sample(dim(df)[1], it_M, replace=FALSE),]
    • dplyr: case_when() + mutate(var = case_when(rnorm(n(),mean=0,sd=1) < 0 ~ 1, TRUE ~ 0)) %>% filter(var == 1)
  4. Generate Variables Conditional on Other Variables, Categorical from Continuous: rmd | r | pdf | html
    • Use case_when to generate elseif conditional variables: NA, approximate difference, etc.
    • Generate Categorical Variables from Continuous Variables.
    • dplyr: case_when() + na_if() + mutate(var = na_if(case_when(rnorm(n())< 0 ~ -99, TRUE ~ mpg), -99))
    • r: e-notation + all.equal() + isTRUE(all.equal(a,b,tol)) + is.na() + NA_real_ + NA_character_ + NA_integer_
  5. R Tibble Dataframe String Manipulations: rmd | r | pdf | html
    • There are multiple CEV files, each containing the same file structure but simulated
    • with different parameters, gather a subset of columns from different files, and provide
    • with correct attributes based on CSV file names.
    • r: cbind(ls_st, ls_st) + as_tibble(mt_st)

2.2 Counting Observation

  1. R Example Counting, Tabulation, and Cross Tabulation: rmd | r | pdf | html
    • Uncount to generate panel skeleton from years in survey
    • dplyr: tally() + spread() + distinct() + uncount(yr_n) + group_by() + mutate(yr = row_number() + start_yr)

2.3 Sorting, Indexing, Slicing

  1. Sorted Index, Interval Index and Expand Value from One Row: rmd | r | pdf | html
    • Sort and generate index for rows
    • Generate negative and positive index based on deviations
    • Populate Values from one row to other rows
    • dplyr: arrange() + row_number() + mutate(lowest = min(Sepal.Length)) + case_when(row_number()==x ~ Septal.Length) + mutate(Sepal.New = Sepal.Length[Sepal.Index == 1])
  2. R Within-group Ascending and Descending Sort, Selection, and Differencing: rmd | r | pdf | html
    • Sort a dataframe by multiple variables, some in descending order.
    • Select observations with the highest M values from within N groups (top scoring students from each class).
    • dplyr: arrange(a, b, desc(c)) + group_by() + lag() + lead() + slice_head(n=1)

2.4 Advanced Group Aggregation

  1. Cummean Test, Cumulative Mean within Group: rmd | r | pdf | html
    • There is a dataframe with a grouping variable and some statistics sorted by another within group
    • variable, calculate the cumulative mean of that variable.
    • dplyr: cummean() + group_by(id, isna = is.na(val)) + mutate(val_cummean = ifelse(isna, NA, cummean(val)))
  2. Count Unique Groups and Mean within Groups: rmd | r | pdf | html
    • Unique groups defined by multiple values and count obs within group.
    • Mean, sd, observation count for non-NA within unique groups.
    • dplyr: group_by() + summarise(n()) + summarise_if(is.numeric, funs(mean = mean(., na.rm = TRUE), n = sum(is.na(.)==0)))
  3. By Groups, One Variable All Statistics: rmd | r | pdf | html
    • Pick stats, overall, and by multiple groups, stats as matrix or wide row with name=(ctsvar + catevar + catelabel).
    • tidyr: group_by() + summarize_at(, funs()) + rename(!!var := !!sym(var)) + mutate(!!var := paste0(var,'str',!!!syms(vars))) + gather() + unite() + spread(varcates, value)
  4. By within Individual Groups Variables, Averages: rmd | r | pdf | html
    • By Multiple within Individual Groups Variables.
    • Averages for all numeric variables within all groups of all group variables. Long to Wide to very Wide.
    • tidyr: gather() + group_by() + summarise_if(is.numeric, funs(mean(., na.rm = TRUE))) + mutate(all_m_cate = paste0(variable, '_c', value)) + unite() + spread()

2.5 Distributional Statistics

  1. Tibble Basics: rmd | r | pdf | html
    • input multiple variables with comma separated text strings
    • quantitative/continuous and categorical/discrete variables
    • histogram and summary statistics
    • tibble: ar_one <- c(107.72,101.28) + ar_two <- c(101.72,101.28) + mt_data <- cbind(ar_one, ar_two) + as_tibble(mt_data)

2.6 Summarize Multiple Variables

  1. Apply the Same Function over Columns and Row Groups: rmd | r | pdf | html
    • Compute row-specific quantiles, based on values across columns within each row.
    • Sum values within-row across multiple columns, ignoring NA.
    • Sum values within-group across multiple rows for matched columns, ignoring NA.
    • Replace NA values in selected columns by alternative values.
    • r: rowSums() + cumsum() + gsub() + mutate_at(vars(matches()), .funs = list(gs = ~sum(.))) + mutate_at(vars(contains()), .funs = list(cumu = ~cumsum(.))) + rename_at(vars(contains()), list(~gsub("M", "", .)))
    • dplyr: group_by(across(one_of(ar_st_vars))) + mutate(across(matches(), func) + rename_at() + mutate_at() + rename_at(vars(starts_with()), funs(str_replace(., "v", "var"))) + mutate_at(vars(one_of()), list(~replace_na(., 99)))
    • purrr: reduce()

3 Functions

3.1 Dataframe Mutate

  1. Nonlinear Function of Scalars and Arrays over Rows: rmd | r | pdf | html
    • Five methods to evaluate scalar nonlinear function over matrix.
    • Evaluate non-linear function with scalar from rows and arrays as constants.
    • r: .$fl_A + fl_A=$`(., 'fl_A') + .[[svr_fl_A]]
    • dplyr: rowwise() + mutate(out = funct(inputs))
  2. Evaluate Functions over Rows of Meshes Matrices: rmd | r | pdf | html
    • Mesh states and choices together and rowwise evaluate many matrixes.
    • Cumulative sum over multiple variables.
    • Rename various various with common prefix and suffix appended.
    • r: ffi <- function(fl_A, ar_B)
    • tidyr: expand_grid() + rowwise() + df %>% rowwise() %>% mutate(var = ffi(fl_A, ar_B))
    • ggplot2: geom_line() + facet_wrap() + geom_hline() + facet_wrap(. ~ var_id, scales = 'free') + geom_hline(yintercept=0, linetype="dashed", color="red", size=1) +

3.2 Dataframe Do Anything

  1. Dataframe Row to Array (Mx1 by N) to (MxQ by N+1): rmd | r | pdf | html
    • Generate row value specific arrays of varying Length, and stack expanded dataframe.
    • Given row-specific information, generate row-specific arrays that expand matrix.
    • dplyr: do() + unnest() + left_join() + df %>% group_by(ID) %>% do(inc = rnorm(.$Q, mean=.$mean, sd=.$sd)) %>% unnest(c(inc))
  2. Simulate country-specific wage draws and compute country wage GINIs: Dataframe (Mx1 by N) to (MxQ by N+1) to (Mx1 by N: rmd | r | pdf | html
    • Define attributes for M groups across N variables, simulate up to Q observations for each of the M Groups, then compute M-specific statistics based on the sample of observations within each M.
    • Start with a matrix that is (Mx1 by N); Expand this to (MxQ by N+1), where, the additional column contains the MxQ specific variable; Compute statistics for each M based on the Q observations with M, and then present (Mx1 by N+1) dataframe.
    • dplyr: group_by(ID) + do(inc = rnorm(.$N, mean=.$mn, sd=.$sd)) + unnest(c(inc)) + left_join(df, by="ID")
  3. Dataframe Subset to Dataframe (MxP by N) to (MxQ by N+Z-1): rmd | r | pdf | html
    • Group by mini dataframes as inputs for function. Stack output dataframes with group id.
    • dplyr: group_by() + do() + unnest()

3.3 Apply and pmap

  1. Apply and Sapply function over arrays and rows: rmd | r | pdf | html
    • Evaluate function f(x_i,y_i,c), where c is a constant and x and y vary over each row of a matrix, with index i indicating rows.
    • Get same results using apply and sapply with defined and anonymous functions.
    • Convert list of list to table.
    • r: do.call() + as_tibble(do.call(rbind,ls)) + apply(mt, 1, func) + sapply(ls_ar, func, ar1, ar2)
  2. Mutate rowwise, mutate pmap, and rowwise do unnest: rmd | r | pdf | html
    • Evaluate function f(x_i,y_i,c), where c is a constant and x and y vary over each row of a matrix, with index i indicating rows.
    • Get same results using various types of mutate rowwise, mutate pmap and rowwise do unnest.
    • dplyr: rowwise() + do() + unnest()
    • purrr: pmap(func)
    • tidyr: unlist()

4 Multi-dimensional Data Structures

4.1 Generate, Gather, Bind and Join

  1. R dplyr Group by Index and Generate Panel Data Structure: rmd | r | pdf | html
    • Build skeleton panel frame with N observations and T periods with gender and height.
    • Generate group Index based on a list of grouping variables.
    • r: runif() + rnorm() + rbinom(n(), 1, 0.5) + cumsum()
    • dplyr: *group_by() + row_number() + ungroup() + one_of() + mutate(var = (row_number()==1)1)
    • tidyr: uncount()
  2. R DPLYR Join Multiple Dataframes Together: rmd | r | pdf | html
    • Join dataframes together with one or multiple keys. Stack dataframes together.
    • dplyr: filter() + rename(!!sym(vsta) := !!sym(vstb)) + mutate(var = rnom(n())) + left_join(df, by=(c('id'='id', 'vt'='vt'))) + left_join(df, by=setNames(c('id', 'vt'), c('id', 'vt'))) + bind_rows()
  3. R Gather Data Columns from Multiple CSV Files: rmd | r | pdf | html
    • There are multiple CEV files, each containing the same file structure but simulated
    • with different parameters, gather a subset of columns from different files, and provide
    • with correct attributes based on CSV file names.
    • Separate numeric and string components of a string variable value apart.
    • r: file() + writeLines() + readLines() + close() + gsub() + read.csv() + do.call(bind_rows, ls_df) + apply()
    • tidyr: separate()
    • regex: (?<=[A-Za-z])(?=[-0-9])

4.2 Wide and Long

  1. Convert Table from Long to Wide with dplyr: rmd | r | pdf | html
    • Long attendance roster to wide roster and calculate cumulative attendance by each day for students.
    • Convert long roster with attendance and test-scores to wide.
    • tidyr: pivot_wider(id_cols = c(v1), names_from = v2, names_prefix = "id", names_sep = "_", values_from = c(v3, v4))
    • dplyr: mutate(var = case_when(rnorm(n()) < 0 ~ 1, TRUE ~ 0)) + rename_at(vars(num_range('', ar_it)), list(~paste0(st_prefix, . , ''))) + mutate_at(vars(contains(str)), list(~replace_na(., 0))) + mutate_at(vars(contains(str)), list(~cumsum(.)))
  2. Convert Table from Wide to Long with dplyr: rmd | r | pdf | html
    • Given a matrix of values with row and column labels, create a table where the unit of observation are the row and column categories, and the values in the matrix is stored in a single variable.
    • Reshape wide to long two sets of variables, two categorical variables added to wide table.
    • tidyr: pivot_longer(cols = starts_with('zi'), names_to = c('zi'), names_pattern = paste0("zi(.)"), values_to = "ev") + pivot_longer(cols = matches('a line b'), names_to = c('va', 'vb'), names_pattern = paste0("(.)_(.)"), values_to = "ev")
    • dplyr: left_join()

4.3 Within Panel Comparisons and Statistics

  1. Find Closest Values Along Grids: rmd | r | pdf | html
    • There is an array (matrix) of values, find the index of the values closest to another value.
    • r: do.call(bind_rows, ls_df)
    • dplyr: left_join(tb, by=(c('vr_a'='vr_a', 'vr_b'='vr_b')))
  2. Cross-group Within-time and Cross-time Within-group Statistics: rmd | r | pdf | html
    • Compute relative values across countries at each time, and relative values within country across time.
    • dplyr: arrange(v1, v2) %>% group_by(v1) %>% mutate(stats := v3/first(v3))

4.4 Join and Merge Files Together by Keys

  1. Mesh join: rmd | r | pdf | html
    • Full join, expand multiple-rows of data-frame with the same set of expansion rows and columns
    • dplyr: full_join()

5 Linear Regression

5.1 Linear and Polynomial Fitting

  1. Find Best Fit of Curves Through Points: rmd | r | pdf | html
    • There are three x and y points, find the quadratic curve that fits through them exactly.
    • There are N sets of x and y points, find the Mth order polynomial fit by regressing y on poly(x, M).
    • stats: lm(y ~ poly(x, 2), dataset=df) + summary.lm(rs) + predict(rs)
  2. Fit a Time Series with Polynomial and Analytical Expressions for Coefficients: rmd | r | pdf | html
    • Given a time series of data points from a polynomial data generating process, solve for the polynomial coefficients.
    • Mth derivative of Mth order polynomial is time invariant, use functions of differences of differences of differences to identify polynomial coefficients analytically.
    • R: matrix multiplication

5.2 OLS and IV

  1. IV/OLS Regression: rmd | r | pdf | html
    • R Instrumental Variables and Ordinary Least Square Regression store all Coefficients and Diagnostics as Dataframe Row.
    • aer: *library(aer) + ivreg(as.formula, diagnostics = TRUE) *
  2. M Outcomes and N RHS Alternatives: rmd | r | pdf | html
    • There are M outcome variables and N alternative explanatory variables. Regress all M outcome variables on N endogenous/independent right hand side variables one by one, with controls and/or IVs, collect coefficients.
    • dplyr: bind_rows(lapply(listx, function(x)(bind_rows(lapply(listy, regf.iv))) + starts_with() + ends_with() + reduce(full_join)

5.3 Decomposition

  1. Regression Decomposition: rmd | r | pdf | html
    • Post multiple regressions, fraction of outcome variables' variances explained by multiple subsets of right hand side variables.
    • dplyr: gather() + group_by(var) + mutate_at(vars, funs(mean = mean(.))) + rowSums(matmat) + mutate_if(is.numeric, funs(frac = (./value_var)))*

6 Nonlinear and Other Regressions

6.1 Logit Regression

  1. Logit Regression: rmd | r | pdf | html
    • Logit regression testing and prediction.
    • stats: glm(as.formula(), data, family='binomial') + predict(rs, newdata, type = "response")
  2. Estimate Logistic Choice Model with Aggregate Shares: rmd | r | pdf | html
    • Aggregate share logistic OLS with K worker types, T time periods and M occupations.
    • Estimate logistic choice model with aggregate shares, allowing for occupation-specific wages and occupation-specific intercepts.
    • Estimate allowing for K and M specific intercepts, K and M specific coefficients, and homogeneous coefficients.
    • Create input matrix data structures for logistic aggregate share estimation.
    • stats: lm(y ~ . -1)
  3. Fit Prices Given Quantities Logistic Choice with Aggregate Data: rmd | r | pdf | html
    • A multinomial logistic choice problem generates choice probabilities across alternatives, find the prices that explain aggregate shares.
    • stats: lm(y ~ . -1)

6.2 Quantile Regression

  1. Quantile Regressions with Quantreg: rmd | r | pdf | html
    • Quantile regression with continuous outcomes. Estimates and tests quantile coefficients.
    • quantreg: rq(mpg ~ disp + hp + factor(am), tau = c(0.25, 0.50, 0.75), data = mtcars) + anova(rq(), test = "Wald", joint=TRUE) + anova(rq(), test = "Wald", joint=FALSE)

7 Optimization

7.1 Grid Based Optimization

  1. Find the Maximizing or Minimizing Point Given Some Objective Function: rmd | r | pdf | html
    • Find the maximizing or minimizing point given some objective function.
    • base: while + min + which.min + sapply
  2. Concurrent Bisection over Dataframe Rows: rmd | r | pdf | html
    • Post multiple regressions, fraction of outcome variables' variances explained by multiple subsets of right hand side variables.
    • tidyr: pivot_longer(cols = starts_with('abc'), names_to = c('a', 'b'), names_pattern = paste0('prefix', "(.)_(.)"), values_to = val) + pivot_wider(names_from = !!sym(name), values_from = val) + mutate(!!sym(abc) := case_when(efg < 0 ~ !!sym(opq), TRUE ~ iso))
    • gglot2: geom_line() + facet_wrap() + geom_hline()

8 Mathematics

8.1 Basics

  1. Analytical Formula Fit Curves Through Points: rmd | r | pdf | html
    • There are three pairs of points, formulas for the exact quadratic curve that fits through the points.
    • There are three pairs of points, we observe only differences in y values, formulas for the linear and quadratic parameters.
    • There are three pairs of points, formulas for the linear best fit line through the points.
    • stats: lm(y ~ x + I(x^2), dataset=df) + lm(y ~ poly(x, 2), dataset=df) + summary.lm(rs) + predict(rs)
  2. Quadratic and Ratio Rescaling of Parameters with Fixed Min and Max: rmd | r | pdf | html
    • For 0<theta<1, generate 0 < thetaHat(theta, lambda) < 1, where lambda is between positive and negative infinity, used to rescale theta.
    • Fit a quadratic function for three points, where the starting and ending points are along the 45 degree line.
    • r: sort(unique()) + sapply(ar, func, param=val)
    • ggplot2: geom_line() + geom_vline() + labs(title, subtitle, x, y, caption) + scale_y_continuous(breaks, limits)
  3. Rescaling Bounded Parameter to be Unbounded and Positive and Negative Exponents with Different Bases: rmd | r | pdf | html
    • Log of alternative bases, bases that are not e, 10 or 2.
    • A parameter is constrained between 1 and negative infinity, use exponentials of different bases to scale the bounded parameter to an unbounded parameter.
    • Positive exponentials are strictly increasing. Negative exponentials are strictly decreasing.
    • A positive number below 1 to a negative exponents is above 1, and a positive number above 1 to a negative exponents is below 1.
    • graphics: plot(x, y) + title() + legend()
  4. Find the Closest Point Along a Line to Another Point: rmd | r | pdf | html
    • A line crosses through the origin, what is the closest point along this line to another point.
    • Graph several functions jointly with points and axis.
    • graphics: par(mfrow = c(1, 1)) + curve(fc) + points(x, y) + abline(v=0, h=0)
  5. linear solve x with f(x) = 0: rmd | r | pdf | html
    • Evaluate and solve statistically relevant problems with one equation and one unknown that permit analytical solutions.

8.2 Production Functions

  1. Nested Constant Elasticity of Substitution Production Function: rmd | r | pdf | html
    • A nested-CES production function with nest-specific elasticities.
    • Re-state the nested-CES problem as several sub-problems.
    • Marginal products and its relationship to prices in expenditure minimization.
  2. Latent Dynamic Health Production Function: rmd | r | pdf | html
    • A model of latent health given lagged latent health and health inputs.
    • Find individual-specific production function coefficient given self-rated discrete health status probabilities.
    • Persistence of latent health status given observed discrete current and lagged outcomes.

8.3 Inequality Models

  1. GINI for Discrete Samples or Discrete Random Variable: rmd | r | pdf | html
    • Given sample of data points that are discrete, compute the approximate GINI coefficient.
    • Given a discrete random variable, compute the GINI coefficient.
    • r: sort() + cumsum() + sum()
  2. CES and Atkinson Inequality Index: rmd | r | pdf | html
    • Analyze how changing individual outcomes shift utility given inequality preference parameters.
    • Discrete a continuous normal random variable with a binomial discrete random variable.
    • Draw Cobb-Douglas, Utilitarian and Leontief indifference curve.
    • r: apply(mt, 1, funct(x){}) + do.call(rbind, ls_mt)
    • tidyr: expand_grid()
    • ggplot2: geom_line() + facet_wrap()
    • econ: Atkinson (JET, 1970)

9 Statistics

9.1 Random Draws

  1. Randomly Perturb Some Parameter Value with Varying Magnitudes: rmd | r | pdf | html
    • Given some existing parameter value, with an intensity value between 0 and 1, decide how to perturb the value.
    • r: matrix
    • stats: qlnorm()
    • graphics: par() + hist() + abline()

9.2 Distributions

  1. Integrate Normal Shocks: rmd | r | pdf | html
    • Random Sampling (Monte Carlo) integrate shocks.
    • Trapezoidal rule (symmetric rectangles) integrate normal shock.

9.3 Discrete Random Variable

  1. Binomial Approximation of Normal: rmd | r | pdf | html
    • Approximate a continuous normal random variable with a discrete binomial random variable.
    • r: hist() + plot()
    • stats: dbinom() + rnorm()

10 Tables and Graphs

10.1 R Base Plots

  1. R Base Plot Line with Curves and Scatter: rmd | r | pdf | html
    • Plot scatter points, line plot and functional curve graphs together.
    • Set margins for legend to be outside of graph area, change line, point, label and legend sizes.
    • Generate additional lines for plots successively, record successively, and plot all steps, or initial steps results.
    • r: plot() + curve() + legend() + title() + axis() + par() + recordPlot()

10.2 ggplot Line Related Plots

  1. ggplot2 Basic Line Plot for Multiple Time Series: rmd | r | pdf | html
    • Given three time series, present both in levels, in log levels, and as ratio
    • ggplot: ggplot() + geom_line()
  2. ggplot Line Plot Multiple Categorical Variables With Continuous Variable: rmd | r | pdf | html
    • One category is subplot, one category is line-color, one category is line-type.
    • One category is subplot, one category is differentiated by line-color, line-type and scatter-shapes.
    • One category are separate plots, two categories are subplots rows and columns, one category is differentiated by line-color, line-type and scatter-shapes.
    • ggplot: ggplot() + facet_wrap() + facet_grid() + geom_line() + geom_point() + geom_smooth() + geom_hline() + scale_colour_manual() + scale_shape_manual() + scale_shape_discrete() + scale_linetype_manual() + scale_x_continuous() + scale_y_continuous() + theme_bw() + theme() + guides() + theme() + ggsave()
    • dplyr: *filter(vara %in% c(1, 2) & varb == "val") + mutate_if() + !any(is.na(suppressWarnings(as.numeric(na.omit(x))))) & is.character(x) *
  3. Time Series with Shaded Regions, plot GDP with recessions: rmd | r | pdf | html
    • Plot several time series with multiple shaded windows.
    • Plot GDP with shaded recession window, and differentially shaded pre- and post-recession windows.
    • r: sample + pmin + diff + which
    • ggplot: ggplot() + geom_line() + geom_rect(aes(xmin, xmax, ymin, ymax)) + theme_light() + scale_colour_manual() + scale_shape_discrete() + scale_linetype_manual() + scale_fill_manual()

10.3 ggplot Scatter Related Plots

  1. ggplot Scatter Plot Grouped or Unique Patterns and Colors: rmd | r | pdf | html
    • Scatter Plot Three Continuous Variables and Multiple Categorical Variables
    • Two continuous variables for the x-axis and the y-axis, another continuous variable for size of scatter, other categorical variables for scatter shape and size.
    • Scatter plot with unique pattern and color for each scatter point.
    • Y and X label axis with two layers of text in levels and deviation from some mid-point values.
    • tibble: rownames_to_column()
    • ggplot: ggplot() + geom_jitter() + geom_smooth() + geom_point(size=1, stroke=1) + scale_colour_manual() + scale_shape_discrete() + scale_linetype_manual() + scale_x_continuous() + scale_y_continuous() + theme_bw() + theme()
  2. ggplot Multiple Scatter-Lines and Facet Wrap Over Categories: rmd | r | pdf | html
    • ggplot multiple lines with scatter as points and connecting lines.
    • Facet wrap to generate subfigures for sub-categories.
    • Generate separate plots from data saved separately.
    • r: apply
    • ggplot: facet_wrap() + geom_smooth() + geom_point() + facet_wrap() + scale_colour_manual() + scale_shape_manual() + scale_linetype_manual()

10.4 Write and Read Plots

  1. Base R Save Images At Different Sizes: rmd | r | pdf | html
    • Base R store image core, add legends/titles/labels/axis of different sizes to save figures of different sizes.
    • r: png() + setEPS() + postscript() + dev.off()

11 Get Data

11.1 Environmental Data

  1. CDS ECMWF Global Enviornmental Data Download: rmd | r | pdf | html
    • Using Python API get get ECMWF ERA5 data.
    • Dynamically modify a python API file, run python inside a Conda virtual environment with R-reticulate.
    • r: file() + writeLines() + unzip() + list.files() + unlink()
    • r-reticulate: use_python() + Sys.setenv(RETICULATE_PYTHON = spth_conda_env)

12 Coding and Development

12.1 Installation and Packages

  1. R, RTools, Rstudio Installation and Update with VSCode: rmd | r | pdf | html
    • Install and update R, RTools, and Rstudio.
    • Set-up R inside VSCode.
    • installr: updateR()
  2. Handling R Packages: rmd | r | pdf | html
    • Resolve conflicts between two packages with identically named function.
    • tidyverse: tidyverse_conflicts
    • dplyr: filter
    • stats: filter
    • conflicted: conflict_prefer()

12.2 Files In and Out

  1. Decompose File Paths to Get Folder and Files Names: rmd | r | pdf | html
    • Decompose file path and get file path folder names and file name.
    • Check if file name exists.
    • r: .Platform$file.sep + tail() + strsplit() + basename() + dirname() + substring() + dir.exists() + file.exists()
  2. Save Text to File, Read Text from File, Replace Text in File: rmd | r | pdf | html
    • Save data to file, read text from file, replace text in file.
    • r: kable() + file() + writeLines() + readLines() + close() + gsub()
  3. Convert R Markdown File to R, PDF and HTML: rmd | r | pdf | html
    • Find all files in a folder with a particula suffix, with exclusion.
    • Convert R Markdow File to R, PDF and HTML.
    • Modify markdown pounds hierarchy.
    • r: file() + writeLines() + readLines() + close() + gsub()

12.3 Python with R

  1. Python in R with Reticulate: rmd | r | pdf | html
    • Use Python in R with Reticulate
    • reticulate: py_config() + use_condaenv() + py_run_string() + Sys.which('python')

12.4 Command Line

  1. System and Shell Commands in R: rmd | r | pdf | html
    • Run system executable and shell commands.
    • Activate conda environment with shell script.
    • r: system() + shell()

12.5 Run Code in Parallel in R

  1. Run Code in Parallel in R: rmd | r | pdf | html
    • Running parallel code in R
    • parallel: detectCores() + makeCluster()
    • doParallel: registerDoParallel()
    • foreach: *dopar *