1 Array, Matrix, Dataframe

1.1 List

Multi-dimensional Named Lists: rmd | r | pdf | html
- Initiate Empty List. Named one and two dimensional lists. List of Dataframes.
- Collapse named and unamed list to string and print input code.
- r: deparse(substitute()) + vector(mode = "list", length = it_N) + names(list) <- paste0('e',seq()) + dimnames(ls2d)[[1]] <- paste0('r',seq()) + dimnames(ls2d)[[2]] <- paste0('c',seq())
- tidyr: unnest()

1.2 Array

Basic Arrays Operations in R: rmd | r | pdf | html
- Generate N-dimensional array of NA values, label dimension elements.
- Basic array operations in R, rep, head, tail, na, etc.
- E notation.
- Get N cuts from M points.
- r: sum() + prod() + rep() + array(NA, dim=c(3, 3)) + array(NA, dim=c(3, 3, 3)) + dimnames(mn)[[3]] = paste0('k=', 0:4) + head() + tail() + na_if() + Re()
- purrr: reduce()
Generate Special Arrays: rmd | r | pdf | html
- Generate equi-distance, special log spaced array.
- Generate probability mass function with non-unique and non-sorted value and probability arrays.
- Generate a set of integer sequences, with gaps in between, e.g., (1,2,3), (5), (10,11).
- r: seq() + sort() + runif() + ceiling() + sample() + apply() + do.call()
- stats: aggregate()
String Operations: rmd | r | pdf | html
- Split, concatenate, subset, replace, and substring strings.
- Convert number to string without decimal and negative sign.
- Concatenate numeric and string arrays as a single string.
- Regular expression
- r: paste0() + paste0(round(runif(3),3), collapse=',') + sub() + gsub() + grepl() + sprintf()
Meshgrid Matrices, Arrays and Scalars: rmd | r | pdf | html
- Meshgrid Matrices, Arrays and Scalars to form all combination dataframe.
- tidyr: expand_grid() + expand.grid()

1.3 Matrix

Matrix Basics: rmd | r | pdf | html
- Generate and combine NA, fixed and random matrixes. Name columns and rows.
- Sort all rows and all columns of a matrix.
- Replace values outside min and max in matrix by NA values.
- R: rep() + rbind() + matrix(NA) + matrix(NA_real_) + matrix(NA_integer_) + colnames() + rownames() + t(apply(mt, 1, sort)) + apply(mt, 2, sort) + colMeans + rowMeans + which()
Linear Algebra Operations: rmd | r | pdf | html

1.4 Regular Expression, Date, etc.

R String Regular Expression (Regex): rmd | r | pdf | html
- Regular expression.
- Find characters that that contain or not contain certain certain strings, numbers, and symbols.
- r: grepl()

2 Manipulate and Summarize Dataframes

2.1 Variables in Dataframes

Generate Tibble Dataframes from Matrix and List: rmd | r | pdf | html
- Generate tibble data from two dimensional named lists, unlist for exporting.
- Generate tibble dataframe, rename tibble variables, generate tibble row and column names.
- Export tibble table to csv file with date and time stamp in file name.
- Rename numeric sequential columns with string prefix and suffix.
- base: Sys.time() + format() + sample(LETTERS, 5, replace = TRUE) + is.list
- dplyr: as_tibble(mt) + rename_all(~c(ar_names)) + rename_at(vars(starts_with("xx")), funs(str_replace(., "yy", "yyyy")) + rename_at(vars(num_range('',ar_it)), funs(paste0(st,.))) + rowid_to_column() + row_number() + min_rank() + dense_rank() + mutate_if()
- base: colnames + rownames
Interact and Cut Variables to Generate Categorical Variables: rmd | r | pdf | html
- Convert rowname to variable name.
- Generate categorical variable from a continuous variable.
- Convert numeric variables to factor variables, generate interaction variables (joint factors), and label factors with descriptive words.
- Graph MPG and 1/4 Miles Time (qsec) from the mtcars dataset over joint shift-type (am) and engine-type (vs) categories.
- r: cut(breaks = ar, values = ar, right = FALSE)
- tibble: rownames_to_column()
- forcats: as_factor() + fct_recode() + fct_cross()
Randomly Draw Subsets of Rows from Matrix: rmd | r | pdf | html
- Given matrix, randomly sample rows, or select if random value is below threshold.
- r: rnorm() + sample() + df[sample(dim(df)[1], it_M, replace=FALSE),]
- dplyr: case_when() + mutate(var = case_when(rnorm(n(),mean=0,sd=1) < 0 ~ 1, TRUE ~ 0)) %>% filter(var == 1)
Generate Variables Conditional on Other Variables, Categorical from Continuous: rmd | r | pdf | html
- Use case_when to generate elseif conditional variables: NA, approximate difference, etc.
- Generate Categorical Variables from Continuous Variables.
- dplyr: case_when() + na_if() + mutate(var = na_if(case_when(rnorm(n())< 0 ~ -99, TRUE ~ mpg), -99))
- r: e-notation + all.equal() + isTRUE(all.equal(a,b,tol)) + is.na() + NA_real_ + NA_character_ + NA_integer_
R Tibble Dataframe String Manipulations: rmd | r | pdf | html
- There are multiple CEV files, each containing the same file structure but simulated
- with different parameters, gather a subset of columns from different files, and provide
- with correct attributes based on CSV file names.
- r: cbind(ls_st, ls_st) + as_tibble(mt_st)

2.2 Counting Observation

R Example Counting, Tabulation, and Cross Tabulation: rmd | r | pdf | html
- Uncount to generate panel skeleton from years in survey
- dplyr: tally() + spread() + distinct() + uncount(yr_n) + group_by() + mutate(yr = row_number() + start_yr)

2.3 Sorting, Indexing, Slicing

Sorted Index, Interval Index and Expand Value from One Row: rmd | r | pdf | html
- Sort and generate index for rows
- Generate negative and positive index based on deviations
- Populate Values from one row to other rows
- dplyr: arrange() + row_number() + mutate(lowest = min(Sepal.Length)) + case_when(row_number()==x ~ Septal.Length) + mutate(Sepal.New = Sepal.Length[Sepal.Index == 1])
R Within-group Ascending and Descending Sort, Selection, and Differencing: rmd | r | pdf | html
- Sort a dataframe by multiple variables, some in descending order.
- Select observations with the highest M values from within N groups (top scoring students from each class).
- dplyr: arrange(a, b, desc(c)) + group_by() + lag() + lead() + slice_head(n=1)

2.4 Advanced Group Aggregation

Cummean Test, Cumulative Mean within Group: rmd | r | pdf | html
- There is a dataframe with a grouping variable and some statistics sorted by another within group
- variable, calculate the cumulative mean of that variable.
- dplyr: cummean() + group_by(id, isna = is.na(val)) + mutate(val_cummean = ifelse(isna, NA, cummean(val)))
Count Unique Groups and Mean within Groups: rmd | r | pdf | html
- Unique groups defined by multiple values and count obs within group.
- Mean, sd, observation count for non-NA within unique groups.
- dplyr: group_by() + summarise(n()) + summarise_if(is.numeric, funs(mean = mean(., na.rm = TRUE), n = sum(is.na(.)==0)))
By Groups, One Variable All Statistics: rmd | r | pdf | html
- Pick stats, overall, and by multiple groups, stats as matrix or wide row with name=(ctsvar + catevar + catelabel).
- tidyr: group_by() + summarize_at(, funs()) + rename(!!var := !!sym(var)) + mutate(!!var := paste0(var,'str',!!!syms(vars))) + gather() + unite() + spread(varcates, value)
By within Individual Groups Variables, Averages: rmd | r | pdf | html
- By Multiple within Individual Groups Variables.
- Averages for all numeric variables within all groups of all group variables. Long to Wide to very Wide.
- tidyr: gather() + group_by() + summarise_if(is.numeric, funs(mean(., na.rm = TRUE))) + mutate(all_m_cate = paste0(variable, '_c', value)) + unite() + spread()

2.5 Distributional Statistics

Tibble Basics: rmd | r | pdf | html
- input multiple variables with comma separated text strings
- quantitative/continuous and categorical/discrete variables
- histogram and summary statistics
- tibble: ar_one <- c(107.72,101.28) + ar_two <- c(101.72,101.28) + mt_data <- cbind(ar_one, ar_two) + as_tibble(mt_data)

2.6 Summarize Multiple Variables

Apply the Same Function over Columns and Row Groups: rmd | r | pdf | html
- Compute row-specific quantiles, based on values across columns within each row.
- Sum values within-row across multiple columns, ignoring NA.
- Sum values within-group across multiple rows for matched columns, ignoring NA.
- Replace NA values in selected columns by alternative values.
- r: rowSums() + cumsum() + gsub() + mutate_at(vars(matches()), .funs = list(gs = ~sum(.))) + mutate_at(vars(contains()), .funs = list(cumu = ~cumsum(.))) + rename_at(vars(contains()), list(~gsub("M", "", .)))
- dplyr: group_by(across(one_of(ar_st_vars))) + mutate(across(matches(), func) + rename_at() + mutate_at() + rename_at(vars(starts_with()), funs(str_replace(., "v", "var"))) + mutate_at(vars(one_of()), list(~replace_na(., 99)))
- purrr: reduce()

3 Functions

3.1 Dataframe Mutate

Nonlinear Function of Scalars and Arrays over Rows: rmd | r | pdf | html
- Five methods to evaluate scalar nonlinear function over matrix.
- Evaluate non-linear function with scalar from rows and arrays as constants.
- r: .$fl_A + fl_A=$`(., 'fl_A') + .[[svr_fl_A]]
- dplyr: rowwise() + mutate(out = funct(inputs))
Evaluate Functions over Rows of Meshes Matrices: rmd | r | pdf | html
- Mesh states and choices together and rowwise evaluate many matrixes.
- Cumulative sum over multiple variables.
- Rename various various with common prefix and suffix appended.
- r: ffi <- function(fl_A, ar_B)
- tidyr: expand_grid() + rowwise() + df %>% rowwise() %>% mutate(var = ffi(fl_A, ar_B))
- ggplot2: geom_line() + facet_wrap() + geom_hline() + facet_wrap(. ~ var_id, scales = 'free') + geom_hline(yintercept=0, linetype="dashed", color="red", size=1) +

3.2 Dataframe Do Anything

Dataframe Row to Array (Mx1 by N) to (MxQ by N+1): rmd | r | pdf | html
- Generate row value specific arrays of varying Length, and stack expanded dataframe.
- Given row-specific information, generate row-specific arrays that expand matrix.
- dplyr: do() + unnest() + left_join() + df %>% group_by(ID) %>% do(inc = rnorm(.$Q, mean=.$mean, sd=.$sd)) %>% unnest(c(inc))
Simulate country-specific wage draws and compute country wage GINIs: Dataframe (Mx1 by N) to (MxQ by N+1) to (Mx1 by N: rmd | r | pdf | html
- Define attributes for M groups across N variables, simulate up to Q observations for each of the M Groups, then compute M-specific statistics based on the sample of observations within each M.
- Start with a matrix that is (Mx1 by N); Expand this to (MxQ by N+1), where, the additional column contains the MxQ specific variable; Compute statistics for each M based on the Q observations with M, and then present (Mx1 by N+1) dataframe.
- dplyr: group_by(ID) + do(inc = rnorm(.$N, mean=.$mn, sd=.$sd)) + unnest(c(inc)) + left_join(df, by="ID")
Dataframe Subset to Dataframe (MxP by N) to (MxQ by N+Z-1): rmd | r | pdf | html
- Group by mini dataframes as inputs for function. Stack output dataframes with group id.
- dplyr: group_by() + do() + unnest()

3.3 Apply and pmap

Apply and Sapply function over arrays and rows: rmd | r | pdf | html
- Evaluate function f(x_i,y_i,c), where c is a constant and x and y vary over each row of a matrix, with index i indicating rows.
- Get same results using apply and sapply with defined and anonymous functions.
- Convert list of list to table.
- r: do.call() + as_tibble(do.call(rbind,ls)) + apply(mt, 1, func) + sapply(ls_ar, func, ar1, ar2)
Mutate rowwise, mutate pmap, and rowwise do unnest: rmd | r | pdf | html
- Evaluate function f(x_i,y_i,c), where c is a constant and x and y vary over each row of a matrix, with index i indicating rows.
- Get same results using various types of mutate rowwise, mutate pmap and rowwise do unnest.
- dplyr: rowwise() + do() + unnest()
- purrr: pmap(func)
- tidyr: unlist()

4 Multi-dimensional Data Structures

4.1 Generate, Gather, Bind and Join

R dplyr Group by Index and Generate Panel Data Structure: rmd | r | pdf | html
- Build skeleton panel frame with N observations and T periods with gender and height.
- Generate group Index based on a list of grouping variables.
- r: runif() + rnorm() + rbinom(n(), 1, 0.5) + cumsum()
- dplyr: *group_by() + row_number() + ungroup() + one_of() + mutate(var = (row_number()==1)1)
- tidyr: uncount()
R DPLYR Join Multiple Dataframes Together: rmd | r | pdf | html
- Join dataframes together with one or multiple keys. Stack dataframes together.
- dplyr: filter() + rename(!!sym(vsta) := !!sym(vstb)) + mutate(var = rnom(n())) + left_join(df, by=(c('id'='id', 'vt'='vt'))) + left_join(df, by=setNames(c('id', 'vt'), c('id', 'vt'))) + bind_rows()
R Gather Data Columns from Multiple CSV Files: rmd | r | pdf | html
- There are multiple CEV files, each containing the same file structure but simulated
- with different parameters, gather a subset of columns from different files, and provide
- with correct attributes based on CSV file names.
- Separate numeric and string components of a string variable value apart.
- r: file() + writeLines() + readLines() + close() + gsub() + read.csv() + do.call(bind_rows, ls_df) + apply()
- tidyr: separate()
- regex: (?<=[A-Za-z])(?=[-0-9])

4.2 Wide and Long

Convert Table from Long to Wide with dplyr: rmd | r | pdf | html
- Long attendance roster to wide roster and calculate cumulative attendance by each day for students.
- Convert long roster with attendance and test-scores to wide.
- tidyr: pivot_wider(id_cols = c(v1), names_from = v2, names_prefix = "id", names_sep = "_", values_from = c(v3, v4))
- dplyr: mutate(var = case_when(rnorm(n()) < 0 ~ 1, TRUE ~ 0)) + rename_at(vars(num_range('', ar_it)), list(~paste0(st_prefix, . , ''))) + mutate_at(vars(contains(str)), list(~replace_na(., 0))) + mutate_at(vars(contains(str)), list(~cumsum(.)))
Convert Table from Wide to Long with dplyr: rmd | r | pdf | html
- Given a matrix of values with row and column labels, create a table where the unit of observation are the row and column categories, and the values in the matrix is stored in a single variable.
- Reshape wide to long two sets of variables, two categorical variables added to wide table.
- tidyr: pivot_longer(cols = starts_with('zi'), names_to = c('zi'), names_pattern = paste0("zi(.)"), values_to = "ev") + pivot_longer(cols = matches('a line b'), names_to = c('va', 'vb'), names_pattern = paste0("(.)_(.)"), values_to = "ev")
- dplyr: left_join()

4.3 Within Panel Comparisons and Statistics

Find Closest Values Along Grids: rmd | r | pdf | html
- There is an array (matrix) of values, find the index of the values closest to another value.
- r: do.call(bind_rows, ls_df)
- dplyr: left_join(tb, by=(c('vr_a'='vr_a', 'vr_b'='vr_b')))
Cross-group Within-time and Cross-time Within-group Statistics: rmd | r | pdf | html
- Compute relative values across countries at each time, and relative values within country across time.
- dplyr: arrange(v1, v2) %>% group_by(v1) %>% mutate(stats := v3/first(v3))

4.4 Join and Merge Files Together by Keys

Mesh join: rmd | r | pdf | html
- Full join, expand multiple-rows of data-frame with the same set of expansion rows and columns
- dplyr: full_join()

5 Linear Regression

5.1 Linear and Polynomial Fitting

Find Best Fit of Curves Through Points: rmd | r | pdf | html
- There are three x and y points, find the quadratic curve that fits through them exactly.
- There are N sets of x and y points, find the Mth order polynomial fit by regressing y on poly(x, M).
- stats: lm(y ~ poly(x, 2), dataset=df) + summary.lm(rs) + predict(rs)
Fit a Time Series with Polynomial and Analytical Expressions for Coefficients: rmd | r | pdf | html
- Given a time series of data points from a polynomial data generating process, solve for the polynomial coefficients.
- Mth derivative of Mth order polynomial is time invariant, use functions of differences of differences of differences to identify polynomial coefficients analytically.
- R: matrix multiplication

5.2 OLS and IV

IV/OLS Regression: rmd | r | pdf | html
- R Instrumental Variables and Ordinary Least Square Regression store all Coefficients and Diagnostics as Dataframe Row.
- aer: *library(aer) + ivreg(as.formula, diagnostics = TRUE) *
M Outcomes and N RHS Alternatives: rmd | r | pdf | html
- There are M outcome variables and N alternative explanatory variables. Regress all M outcome variables on N endogenous/independent right hand side variables one by one, with controls and/or IVs, collect coefficients.
- dplyr: bind_rows(lapply(listx, function(x)(bind_rows(lapply(listy, regf.iv))) + starts_with() + ends_with() + reduce(full_join)

5.3 Decomposition

Regression Decomposition: rmd | r | pdf | html
- Post multiple regressions, fraction of outcome variables' variances explained by multiple subsets of right hand side variables.
- dplyr: gather() + group_by(var) + mutate_at(vars, funs(mean = mean(.))) + rowSums(matmat) + mutate_if(is.numeric, funs(frac = (./value_var)))*

6 Nonlinear and Other Regressions

6.1 Logit Regression

Logit Regression: rmd | r | pdf | html
- Logit regression testing and prediction.
- stats: glm(as.formula(), data, family='binomial') + predict(rs, newdata, type = "response")
Estimate Logistic Choice Model with Aggregate Shares: rmd | r | pdf | html
- Aggregate share logistic OLS with K worker types, T time periods and M occupations.
- Estimate logistic choice model with aggregate shares, allowing for occupation-specific wages and occupation-specific intercepts.
- Estimate allowing for K and M specific intercepts, K and M specific coefficients, and homogeneous coefficients.
- Create input matrix data structures for logistic aggregate share estimation.
- stats: lm(y ~ . -1)
Fit Prices Given Quantities Logistic Choice with Aggregate Data: rmd | r | pdf | html
- A multinomial logistic choice problem generates choice probabilities across alternatives, find the prices that explain aggregate shares.
- stats: lm(y ~ . -1)

6.2 Quantile Regression

Quantile Regressions with Quantreg: rmd | r | pdf | html
- Quantile regression with continuous outcomes. Estimates and tests quantile coefficients.
- quantreg: rq(mpg ~ disp + hp + factor(am), tau = c(0.25, 0.50, 0.75), data = mtcars) + anova(rq(), test = "Wald", joint=TRUE) + anova(rq(), test = "Wald", joint=FALSE)

7 Optimization

7.1 Grid Based Optimization

Find the Maximizing or Minimizing Point Given Some Objective Function: rmd | r | pdf | html
- Find the maximizing or minimizing point given some objective function.
- base: while + min + which.min + sapply
Concurrent Bisection over Dataframe Rows: rmd | r | pdf | html
- Post multiple regressions, fraction of outcome variables' variances explained by multiple subsets of right hand side variables.
- tidyr: pivot_longer(cols = starts_with('abc'), names_to = c('a', 'b'), names_pattern = paste0('prefix', "(.)_(.)"), values_to = val) + pivot_wider(names_from = !!sym(name), values_from = val) + mutate(!!sym(abc) := case_when(efg < 0 ~ !!sym(opq), TRUE ~ iso))
- gglot2: geom_line() + facet_wrap() + geom_hline()

8 Mathematics

8.1 Basics

Analytical Formula Fit Curves Through Points: rmd | r | pdf | html
- There are three pairs of points, formulas for the exact quadratic curve that fits through the points.
- There are three pairs of points, we observe only differences in y values, formulas for the linear and quadratic parameters.
- There are three pairs of points, formulas for the linear best fit line through the points.
- stats: lm(y ~ x + I(x^2), dataset=df) + lm(y ~ poly(x, 2), dataset=df) + summary.lm(rs) + predict(rs)
Quadratic and Ratio Rescaling of Parameters with Fixed Min and Max: rmd | r | pdf | html
- For 0<theta<1, generate 0 < thetaHat(theta, lambda) < 1, where lambda is between positive and negative infinity, used to rescale theta.
- Fit a quadratic function for three points, where the starting and ending points are along the 45 degree line.
- r: sort(unique()) + sapply(ar, func, param=val)
- ggplot2: geom_line() + geom_vline() + labs(title, subtitle, x, y, caption) + scale_y_continuous(breaks, limits)
Rescaling Bounded Parameter to be Unbounded and Positive and Negative Exponents with Different Bases: rmd | r | pdf | html
- Log of alternative bases, bases that are not e, 10 or 2.
- A parameter is constrained between 1 and negative infinity, use exponentials of different bases to scale the bounded parameter to an unbounded parameter.
- Positive exponentials are strictly increasing. Negative exponentials are strictly decreasing.
- A positive number below 1 to a negative exponents is above 1, and a positive number above 1 to a negative exponents is below 1.
- graphics: plot(x, y) + title() + legend()
Find the Closest Point Along a Line to Another Point: rmd | r | pdf | html
- A line crosses through the origin, what is the closest point along this line to another point.
- Graph several functions jointly with points and axis.
- graphics: par(mfrow = c(1, 1)) + curve(fc) + points(x, y) + abline(v=0, h=0)
linear solve x with f(x) = 0: rmd | r | pdf | html
- Evaluate and solve statistically relevant problems with one equation and one unknown that permit analytical solutions.

8.2 Production Functions

Nested Constant Elasticity of Substitution Production Function: rmd | r | pdf | html
- A nested-CES production function with nest-specific elasticities.
- Re-state the nested-CES problem as several sub-problems.
- Marginal products and its relationship to prices in expenditure minimization.
Latent Dynamic Health Production Function: rmd | r | pdf | html
- A model of latent health given lagged latent health and health inputs.
- Find individual-specific production function coefficient given self-rated discrete health status probabilities.
- Persistence of latent health status given observed discrete current and lagged outcomes.

8.3 Inequality Models

GINI for Discrete Samples or Discrete Random Variable: rmd | r | pdf | html
- Given sample of data points that are discrete, compute the approximate GINI coefficient.
- Given a discrete random variable, compute the GINI coefficient.
- r: sort() + cumsum() + sum()
CES and Atkinson Inequality Index: rmd | r | pdf | html
- Analyze how changing individual outcomes shift utility given inequality preference parameters.
- Discrete a continuous normal random variable with a binomial discrete random variable.
- Draw Cobb-Douglas, Utilitarian and Leontief indifference curve.
- r: apply(mt, 1, funct(x){}) + do.call(rbind, ls_mt)
- tidyr: expand_grid()
- ggplot2: geom_line() + facet_wrap()
- econ: Atkinson (JET, 1970)

9 Statistics

9.1 Random Draws

Randomly Perturb Some Parameter Value with Varying Magnitudes: rmd | r | pdf | html
- Given some existing parameter value, with an intensity value between 0 and 1, decide how to perturb the value.
- r: matrix
- stats: qlnorm()
- graphics: par() + hist() + abline()

9.2 Distributions

Integrate Normal Shocks: rmd | r | pdf | html
- Random Sampling (Monte Carlo) integrate shocks.
- Trapezoidal rule (symmetric rectangles) integrate normal shock.

9.3 Discrete Random Variable

Binomial Approximation of Normal: rmd | r | pdf | html
- Approximate a continuous normal random variable with a discrete binomial random variable.
- r: hist() + plot()
- stats: dbinom() + rnorm()

10 Tables and Graphs

10.1 R Base Plots

R Base Plot Line with Curves and Scatter: rmd | r | pdf | html
- Plot scatter points, line plot and functional curve graphs together.
- Set margins for legend to be outside of graph area, change line, point, label and legend sizes.
- Generate additional lines for plots successively, record successively, and plot all steps, or initial steps results.
- r: plot() + curve() + legend() + title() + axis() + par() + recordPlot()

10.2 ggplot Line Related Plots

ggplot2 Basic Line Plot for Multiple Time Series: rmd | r | pdf | html
- Given three time series, present both in levels, in log levels, and as ratio
- ggplot: ggplot() + geom_line()
ggplot Line Plot Multiple Categorical Variables With Continuous Variable: rmd | r | pdf | html
- One category is subplot, one category is line-color, one category is line-type.
- One category is subplot, one category is differentiated by line-color, line-type and scatter-shapes.
- One category are separate plots, two categories are subplots rows and columns, one category is differentiated by line-color, line-type and scatter-shapes.
- ggplot: ggplot() + facet_wrap() + facet_grid() + geom_line() + geom_point() + geom_smooth() + geom_hline() + scale_colour_manual() + scale_shape_manual() + scale_shape_discrete() + scale_linetype_manual() + scale_x_continuous() + scale_y_continuous() + theme_bw() + theme() + guides() + theme() + ggsave()
- dplyr: *filter(vara %in% c(1, 2) & varb == "val") + mutate_if() + !any(is.na(suppressWarnings(as.numeric(na.omit(x))))) & is.character(x) *
Time Series with Shaded Regions, plot GDP with recessions: rmd | r | pdf | html
- Plot several time series with multiple shaded windows.
- Plot GDP with shaded recession window, and differentially shaded pre- and post-recession windows.
- r: sample + pmin + diff + which
- ggplot: ggplot() + geom_line() + geom_rect(aes(xmin, xmax, ymin, ymax)) + theme_light() + scale_colour_manual() + scale_shape_discrete() + scale_linetype_manual() + scale_fill_manual()

10.3 ggplot Scatter Related Plots

ggplot Scatter Plot Grouped or Unique Patterns and Colors: rmd | r | pdf | html
- Scatter Plot Three Continuous Variables and Multiple Categorical Variables
- Two continuous variables for the x-axis and the y-axis, another continuous variable for size of scatter, other categorical variables for scatter shape and size.
- Scatter plot with unique pattern and color for each scatter point.
- Y and X label axis with two layers of text in levels and deviation from some mid-point values.
- tibble: rownames_to_column()
- ggplot: ggplot() + geom_jitter() + geom_smooth() + geom_point(size=1, stroke=1) + scale_colour_manual() + scale_shape_discrete() + scale_linetype_manual() + scale_x_continuous() + scale_y_continuous() + theme_bw() + theme()
ggplot Multiple Scatter-Lines and Facet Wrap Over Categories: rmd | r | pdf | html
- ggplot multiple lines with scatter as points and connecting lines.
- Facet wrap to generate subfigures for sub-categories.
- Generate separate plots from data saved separately.
- r: apply
- ggplot: facet_wrap() + geom_smooth() + geom_point() + facet_wrap() + scale_colour_manual() + scale_shape_manual() + scale_linetype_manual()

10.4 Write and Read Plots

Base R Save Images At Different Sizes: rmd | r | pdf | html
- Base R store image core, add legends/titles/labels/axis of different sizes to save figures of different sizes.
- r: png() + setEPS() + postscript() + dev.off()

11 Get Data

11.1 Environmental Data

CDS ECMWF Global Enviornmental Data Download: rmd | r | pdf | html
- Using Python API get get ECMWF ERA5 data.
- Dynamically modify a python API file, run python inside a Conda virtual environment with R-reticulate.
- r: file() + writeLines() + unzip() + list.files() + unlink()
- r-reticulate: use_python() + Sys.setenv(RETICULATE_PYTHON = spth_conda_env)

12 Coding and Development

12.1 Installation and Packages

R, RTools, Rstudio Installation and Update with VSCode: rmd | r | pdf | html
- Install and update R, RTools, and Rstudio.
- Set-up R inside VSCode.
- installr: updateR()
Handling R Packages: rmd | r | pdf | html
- Resolve conflicts between two packages with identically named function.
- tidyverse: tidyverse_conflicts
- dplyr: filter
- stats: filter
- conflicted: conflict_prefer()

12.2 Files In and Out

Decompose File Paths to Get Folder and Files Names: rmd | r | pdf | html
- Decompose file path and get file path folder names and file name.
- Check if file name exists.
- r: .Platform$file.sep + tail() + strsplit() + basename() + dirname() + substring() + dir.exists() + file.exists()
Save Text to File, Read Text from File, Replace Text in File: rmd | r | pdf | html
- Save data to file, read text from file, replace text in file.
- r: kable() + file() + writeLines() + readLines() + close() + gsub()
Convert R Markdown File to R, PDF and HTML: rmd | r | pdf | html
- Find all files in a folder with a particula suffix, with exclusion.
- Convert R Markdow File to R, PDF and HTML.
- Modify markdown pounds hierarchy.
- r: file() + writeLines() + readLines() + close() + gsub()

12.3 Python with R

Python in R with Reticulate: rmd | r | pdf | html
- Use Python in R with Reticulate
- reticulate: py_config() + use_condaenv() + py_run_string() + Sys.which('python')

12.4 Command Line

System and Shell Commands in R: rmd | r | pdf | html
- Run system executable and shell commands.
- Activate conda environment with shell script.
- r: system() + shell()

12.5 Run Code in Parallel in R

Run Code in Parallel in R: rmd | r | pdf | html
- Running parallel code in R
- parallel: detectCores() + makeCluster()
- doParallel: registerDoParallel()
- foreach: *dopar *

Files

README_toc.md

Latest commit

History

README_toc.md

File metadata and controls

1 Array, Matrix, Dataframe

1.1 List

1.2 Array

1.3 Matrix

1.4 Regular Expression, Date, etc.

2 Manipulate and Summarize Dataframes

2.1 Variables in Dataframes

2.2 Counting Observation

2.3 Sorting, Indexing, Slicing

2.4 Advanced Group Aggregation

2.5 Distributional Statistics

2.6 Summarize Multiple Variables

3 Functions

3.1 Dataframe Mutate

3.2 Dataframe Do Anything

3.3 Apply and pmap

4 Multi-dimensional Data Structures

4.1 Generate, Gather, Bind and Join

4.2 Wide and Long

4.3 Within Panel Comparisons and Statistics

4.4 Join and Merge Files Together by Keys

5 Linear Regression

5.1 Linear and Polynomial Fitting

5.2 OLS and IV

5.3 Decomposition

6 Nonlinear and Other Regressions

6.1 Logit Regression

6.2 Quantile Regression

7 Optimization

7.1 Grid Based Optimization

8 Mathematics

8.1 Basics

8.2 Production Functions

8.3 Inequality Models

9 Statistics

9.1 Random Draws

9.2 Distributions

9.3 Discrete Random Variable

10 Tables and Graphs

10.1 R Base Plots

10.2 ggplot Line Related Plots

10.3 ggplot Scatter Related Plots

10.4 Write and Read Plots

11 Get Data

11.1 Environmental Data

12 Coding and Development

12.1 Installation and Packages

12.2 Files In and Out

12.3 Python with R

12.4 Command Line

12.5 Run Code in Parallel in R