-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.qmd
261 lines (166 loc) · 8.91 KB
/
README.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
---
title: "README"
format: gfm
---
# Developer Notes
## 2024-08-10
- Cleaning up documentation
- Make unit tests for yoy
- Clean up consistency of syntax
- Fixed make aggregation tbl assertthat formula
## 2024-08-11
- Cleaned up group by logic in time intelligence functions
## 2024-08-13
- translated make_aggregation_tbl to sql (without group logic)
- fixed how to add sub query that contains all previous steps
## 2024-08-15
- fixed with and cte functions to build queries quicker
## 2024-08-31
- I figured out template for factor analysis -- now i just need to figure out methodology and modeling approach
- will use formula approach (target~factor1+factor2+factor3+...)
- do pop calculations first with ti functions
- then sequence the attribution
- documented two methodologies for calculating sales price mix
## 2024-09-01
- updated formula calls and structure of methodology -- period + agg args
- established possible framework to execute
## 2024-09-07
- Finished with() and cte() helper functions for time intelligence conversions
- pushed sql helper function to package
- sql_query_select
- date_seq_sql
- capture_original_query
- with()
- cte()
- started rewriting queries to take advantage of new sql helper functions
- create first draft of make_aggregation_tbl in sql -- need to work through groups version
- Error in referencing the column names (coming in quoted)
## 2024-09-10
- figured out quoting issue and it can be solved with DBI::Id("test", "species")
## 2024-09-11
- Figure out how to do unique groups issue, so just need to optimize for multiple groups and clean up
- Then can move to other time intelligence functions
2024-09-14
- solved multiple group issue (finally!) need to clean up and layer it to R package and document / clean up
- solved logic for one group var vs. multiple group var
- created dbi equivalents all w,m,y,a todate functions
- remaining functions are wow,mom, and yoy
2024-09-15
- cleaned up window_order / group_by for dbi *td expressions
2024-09-15
- added dod,wow,mom, and yoy dbi args
- added dod tibble args
2024-09-21
- add working factor() formula where you can pass in args of pp(),cp() and d() to create new column names
- need to figure out how to transform the dataset with required columns, add in column checks
- perhaps dataset is passed through first and transformed then a check to see if the transformation has already happened if yes then don't do anything then a check if columns exist with types, we can remove it out of pp,d,cp formulas
- only question if need some wtd. average
- also orginal part was to put the target value and various formulas against that to autoamte checks and reporting?
- okay need pp() to reduce multiple arumnets one of which is mutate create lag, and then also craete delta
2024-09-26
- discovered pattern to keep formula object and then
- formula > terms.formula>attr("variable") to get list of call objects (B) to execute them to get their output (C)
- take original formula and subset as needed then quo_name() to turn to string and then use str_replace(formula_chr,B[[2]],C[[1]]) to keep the operators
2024-09-28
- have cp(), d(), and pp() just produce columns with names and capture the operations
- eg. (lag_quanitity*price)+(price*quantity_prop)
- then have collector function eg. pvm() or something else take the columns names that are produced
- Create the required columns based on the column logic eg lag_quanitty means mutate(lag_quantity=lag(quantity,1))
- easier to capture controls and create the problems all at once
- now need to figure out to pass lists onto each other pp() to cp() should augment column names with each and then same from data object
- so either we transfrom the dataframe as we go along eg each factor step mutates as we go along creating the columns
- or factor step just passes along a list of formula and formula names to transform
- I went down route B but realizing its easier with route A
## to do list
- finalize time intelligence functions
- create consistent pattern for labels
- for pop create return granularity arg
- create unified framework for dbi or tibble objects?
- Create factor analysis
- New Products introduced within the period
- Discontinued Products within the period
-
- Discounts and Returns for products that result in zero or negative revenue over a period
- update examples with contoso example
- covert lm to sql equivalents
- design and create robust unit tests
- make 554 calendar date table
- create documentation with examples
# Introduction to fpaR
This is a collection of business intelligence tools and patterns to help with common data queries and insights. This is based on numerous blogs, books, post and work experience in what I have seen as most frequent analysis /work effort.
The package functions can be split between three focus areas:
- Segmentation strategies
- UMAP
- Kmeans
- ABC
- New vs. Returning Customers/ Vendors
- Time intelligence functions
- List of standard time intelligence functions such as month-to-date, year over year for standard and non-standard calendars
- See table before complete list
- Non-standard calendar such a 5-5-4 calendar
- DAX Equivalents
- Applied statsitical techniques to underestand variance analysis including graphing techniques
- Like for like comparison
- You calculate certain components such as price, customer turnover, high margin vs. jkdjklow margin segmentations
- Factor / Variation analysis
- price, quantity, mix factor analysis with variations
The package is designed to work with regular tibble data and database by translating the functions to their sql variation
## Installation
You can install the development version of fpaR from [GitHub](https://github.com/alejandrohagan/fpaR) with:
```{r}
#| message: false
#| warning: false
#| error: false
#| eval: false
# install.packages("devtools")
devtools::install_github("alejandrohagan/fpaR")
```
## Package Components
This is correctly under development and is in early stages!
This package is heavily inspired by the DAX Patterns for the time intelligence functions
**Segmentation helpers:**
- **make_segmentation()** will use kmeans and umap to create a segmentation
- **make_cohort_tbl()** will segment your data into cohorts based on their first transaction date
- **abc()** will create a three category segmentation ('A','B','C') based on the cut points that are submitted
- **abc_graph()** will plot your `abc()` segmentation
**Calculation Helpers**
- **count_plus()** will replicate the handy dplyr::count() function but augments with portions and cumulative sum of proportions
- **show_excel()** will show your tibbles or data frames steps in excel
- **divide** will give an default NA or alternative to dividing by zero that isn't `Inf` (model after [DAX divide](https://learn.microsoft.com/en-us/dax/divide-function-dax))
**Time Intelligence Functions**
- Standard time intelligence functions^[See table below]
- These are time intelligence fucntions that are consistent with a standard calendar (eg. the calendar on your computer or your phone)
- Non-standard time intelligence functions
- These are calendars that maybe more common in the retail business (Eg. 5-4-5 quarter calcualtion)
- These calendars aim to control for weekends as that may be larger driver of sales
Time intelligence functions aren't always as straight forward as you want them to be as you may have missing dates from previous comparison periods.
The function basically ensures you have a complete date field and then does common time intelligence functions against the augmented data
**Variation Analysis**
- variance / factor analysis
**Practice Datasets**
- Microsoft's Contoso Dataset to help with practice with transaction data in both tibble and duckdb data
```{r}
#| echo: false
#| warning: false
#| message: false
#| error: false
tibble::tribble(
~"Short Name",~"Description",
"YTD", "Year-to-date",
"QTD", "Quarter-to-date",
"MAT", "Moving annual total",
"PYC", "Previous year complete",
"PQC", "Previous quarter complete",
"PMC", "Previous month complete",
"PP", "Previous period; automatically selects year, quarter, or month" ,
"PYMAT", "Previous year moving annual total" ,
"MATG", "Moving annual total growth",
"POP", "Period-over-period; automatically selects year, quarter, or month",
"PYTD", "Previous year-to-date",
"PQTD", "Previous quarter-to-date",
"YOYTD", "Year-over-year-to-date" ,
"QOQTD", " Quarter-over-quarter-to-date",
"YTDOPY", "Year-to-date-over-previous-year",
"QTDOPQ", "Quarter-to-date-over-previous-quarter",
)
```