diff --git a/docs/index.html b/docs/index.html index d20764c..cc0b04e 100644 --- a/docs/index.html +++ b/docs/index.html @@ -295,7 +295,6 @@

Tablecloth documentation

Introduction

-

….

tech.ml.dataset is a great and fast library which brings columnar dataset to the Clojure. Chris Nuernberger has been working on this library for last year as a part of bigger tech.ml stack.

I’ve started to test the library and help to fix uncovered bugs. My main goal was to compare functionalities with the other standards from other platforms. I focused on R solutions: dplyr, tidyr and data.table.

During conversions of the examples I’ve come up how to reorganized existing tech.ml.dataset functions into simple to use API. The main goals were:

@@ -862,7 +861,7 @@

Dataset creation

:z -tech.v3.datatype.unary_op\(eval18246\)fn$reify__18257@4fa8745b +tech.v3.datatype.unary_op\(eval18246\)fn$reify__18257@293351d2 @@ -1610,13 +1609,13 @@

Columns and rows

(take 2 (tc/rows ds))
-
([#object[java.time.LocalDate 0x1c5f3a8 "2012-01-01"]
+
([#object[java.time.LocalDate 0x43abe9c4 "2012-01-01"]
   0.0
   12.8
   5.0
   4.7
   "drizzle"]
- [#object[java.time.LocalDate 0x5805d953 "2012-01-02"]
+ [#object[java.time.LocalDate 0x7a714db8 "2012-01-02"]
   10.9
   10.6
   2.8
@@ -4578,7 +4577,7 @@ 

Rename

v1 v2 [1 2 3] -java.lang.Object@396269aa +java.lang.Object@88b0370 @@ -4818,7 +4817,7 @@

Rename

v1 v2 [1 2 3] -java.lang.Object@75a749d5 +java.lang.Object@23c1653c @@ -4878,7 +4877,7 @@

Rename

v1 v2 [1 2 3] -java.lang.Object@75a749d5 +java.lang.Object@23c1653c @@ -5033,55 +5032,55 @@

Add or update

-0.86401689 +0.07257872 1 0.5 A -0.14043806 +0.45689050 2 1.0 B -0.68097777 +0.52503725 3 1.5 C -0.91875674 +0.11928382 4 0.5 A -0.24475970 +0.37822512 5 1.0 B -0.06825930 +0.35239845 6 1.5 C -0.04138129 +0.17997252 7 0.5 A -0.87095448 +0.11440262 8 1.0 B -0.73301107 +0.78541957 9 1.5 C @@ -5972,7 +5971,7 @@

Update

1 -7 +9 0.5 A @@ -5984,43 +5983,43 @@

Update

1 -5 +1 1.5 C 2 -9 +4 0.5 A 1 -6 +7 1.0 B 2 -3 +5 1.5 C 1 -8 +3 0.5 A 2 -4 +8 1.0 B 1 -1 +6 1.5 C @@ -7508,10 +7507,10 @@

Other

-1 -9 -1.5 -C +2 +2 +1.0 +B @@ -7561,23 +7560,23 @@

Other

-2 -4 -0.5 -A - - 1 -1 -0.5 -A +9 +1.5 +C - -1 -3 + +2 +6 1.5 C + +2 +4 +0.5 +A + 1 5 @@ -7592,27 +7591,27 @@

Other

1 -9 +3 1.5 C -1 -7 +2 +4 0.5 A -1 -9 -1.5 -C +2 +4 +0.5 +A -2 -2 -1.0 -B +1 +1 +0.5 +A @@ -7636,34 +7635,34 @@

Other

-2 -8 -1.0 -B +1 +1 +0.5 +A -2 -8 -1.0 -B +1 +9 +1.5 +C 2 -2 -1.0 -B - - -2 6 1.5 C - + 2 +4 +0.5 +A + + 2 -1.0 -B +6 +1.5 +C @@ -7694,27 +7693,27 @@

Other

2 -2 -1.0 -B +6 +1.5 +C -1 -1 -0.5 -A - - 2 4 0.5 A + +1 +9 +1.5 +C + -2 -8 -1.0 -B +1 +1 +0.5 +A @@ -7789,34 +7788,34 @@

Other

1 -7 +1 0.5 A -2 -6 +1 +9 1.5 C +2 +8 +1.0 +B + + 1 3 1.5 C - -1 + 1 +7 0.5 A - -2 -8 -1.0 -B - 2 2 @@ -7824,23 +7823,23 @@

Other

B -1 -5 -1.0 -B - - 2 4 0.5 A - -1 -9 + +2 +6 1.5 C + +1 +5 +1.0 +B +
@@ -8151,8 +8150,8 @@

Other

-2 -4 +1 +7 0.5 A @@ -8163,14 +8162,14 @@

Other

A -1 -1 +2 +4 0.5 A -1 -7 +2 +4 0.5 A @@ -8182,7 +8181,7 @@

Other

2 -2 +8 1.0 B @@ -8193,8 +8192,8 @@

Other

B -1 -5 +2 +8 1.0 B @@ -8211,26 +8210,26 @@

Other

B -1 -9 +2 +6 1.5 C -1 -3 +2 +6 1.5 C -1 -3 +2 +6 1.5 C 1 -3 +9 1.5 C @@ -9790,16 +9789,16 @@

Strategies

-2 -4 +1 +1 0.5 A -1 -9 -1.5 -C +2 +4 +0.5 +A @@ -13549,15 +13548,15 @@

Array column conve :a -[D@4f39fc33 +[D@30b08cf4 :b -[D@26d55d8a +[D@1e074828 :c -[D@6168dfaf +[D@5b5773c1 @@ -16551,8 +16550,8 @@

Longer

1 1 0 -0.94816679 -0.87288772 +0.30910644 +0.48917879 3 -2 @@ -16560,8 +16559,8 @@

Longer

2 1 1 -0.06125362 -0.11137475 +0.54225370 +0.27471082 3 -2 @@ -16569,8 +16568,8 @@

Longer

3 0 1 -0.90309774 -0.84951496 +0.90517086 +0.55928184 3 -2 @@ -16578,8 +16577,8 @@

Longer

4 0 1 -0.55768353 -0.29332929 +0.15731113 +0.44552143 3 -2 @@ -16610,7 +16609,7 @@

Longer

1 0 1 -0.94816679 +0.30910644 3 @@ -16618,7 +16617,7 @@

Longer

1 1 1 -0.06125362 +0.54225370 3 @@ -16626,7 +16625,7 @@

Longer

0 1 1 -0.90309774 +0.90517086 3 @@ -16634,7 +16633,7 @@

Longer

0 1 1 -0.55768353 +0.15731113 3 @@ -16642,7 +16641,7 @@

Longer

1 0 2 -0.87288772 +0.48917879 -2 @@ -16650,7 +16649,7 @@

Longer

1 1 2 -0.11137475 +0.27471082 -2 @@ -16658,7 +16657,7 @@

Longer

0 1 2 -0.84951496 +0.55928184 -2 @@ -16666,7 +16665,7 @@

Longer

0 1 2 -0.29332929 +0.44552143 -2 @@ -25447,25 +25446,13 @@

Concat

1 -1 -0.5 -A - - -2 -8 +5 1.0 B - -2 -6 -1.5 -C - -2 -6 +1 +3 1.5 C @@ -25476,16 +25463,16 @@

Concat

B -1 -7 +2 +4 0.5 A -1 -5 -1.0 -B +2 +6 +1.5 +C 2 @@ -25494,17 +25481,29 @@

Concat

A -2 -8 +1 +5 1.0 B 1 -9 +7 +0.5 +A + + +2 +6 1.5 C + +1 +1 +0.5 +A + … … @@ -25513,39 +25512,27 @@

Concat

1 -1 -0.5 -A +5 +1.0 +B 1 -1 -0.5 -A - - -1 9 1.5 C - -2 -2 -1.0 -B - -2 -8 -1.0 -B +1 +1 +0.5 +A -2 -6 -1.5 -C +1 +7 +0.5 +A 1 @@ -25555,16 +25542,28 @@

Concat

1 -9 -1.5 -C +7 +0.5 +A +1 +1 +0.5 +A + + 2 2 1.0 B + +1 +7 +0.5 +A + 1 5 @@ -25786,53 +25785,53 @@

Union

A -1 -7 -0.5 -A - - 2 2 1.0 B - + 1 -9 +3 +1.5 +C + + +2 +6 1.5 C +1 +7 +0.5 +A + + 2 8 1.0 B - -2 -4 -0.5 -A - 1 -3 +9 1.5 C -2 -6 -1.5 -C - - 1 5 1.0 B + +2 +4 +0.5 +A +
@@ -26387,7 +26386,7 @@

Split into train/test 0 :a -:g1 +:g2 1 @@ -26402,27 +26401,27 @@

Split into train/test 3 :a -:g3 +:g1 4 :a -:g1 +:g2 5 :a -:g2 +:g3 6 :a -:g3 +:g2 7 :a -:g1 +:g3 8 @@ -26432,7 +26431,7 @@

Split into train/test 9 :a -:g2 +:g1 … @@ -26442,17 +26441,17 @@

Split into train/test 14 :a -:g1 +:g2 15 :a -:g3 +:g2 16 :a -:g3 +:g2 17 @@ -26462,7 +26461,7 @@

Split into train/test 18 :a -:g3 +:g1 19 @@ -26472,17 +26471,17 @@

Split into train/test 20 :b -:g2 +:g3 21 :b -:g1 +:g3 22 :b -:g2 +:g3 23 @@ -26492,7 +26491,7 @@

Split into train/test 24 :b -:g2 +:g3 @@ -26520,212 +26519,212 @@

k-Fold

-13 -:a -:g1 +21 +:b +:g3 :train 0 -21 +23 :b -:g1 +:g3 :train 0 -22 -:b -:g2 +1 +:a +:g3 :train 0 -24 +20 :b -:g2 +:g3 :train 0 -2 +18 :a :g1 :train 0 -4 +9 :a :g1 :train 0 -20 -:b +6 +:a :g2 :train 0 -23 -:b +7 +:a :g3 :train 0 -7 +19 :a -:g1 +:g2 :train 0 -12 +5 :a -:g2 +:g3 :train 0 -1 +13 :a :g3 :train 0 -8 +16 :a :g2 :train 0 -15 +10 :a -:g3 +:g1 :train 0 -6 +3 :a -:g3 +:g1 :train 0 -18 +8 :a -:g3 +:g2 :train 0 -3 +14 :a -:g3 +:g2 :train 0 -0 -:a -:g1 +24 +:b +:g3 :train 0 -19 +4 :a :g2 :train 0 -16 +2 :a -:g3 +:g1 :train 0 -11 +0 :a -:g3 +:g2 :train 0 -10 +12 :a :g1 :test 0 -14 +15 :a -:g1 +:g2 :test 0 -5 +11 :a :g2 :test 0 -17 -:a +22 +:b :g3 :test 0 -9 +17 :a -:g2 +:g3 :test 0 -10 +12 :a :g1 :train 1 -14 +15 :a -:g1 +:g2 :train 1 -5 +11 :a :g2 :train 1 -17 -:a +22 +:b :g3 :train 1 -9 +17 :a -:g2 +:g3 :train 1 @@ -26753,212 +26752,212 @@

k-Fold

-8 +4 :a :g2 :train 0 -7 +9 :a :g1 :train 0 -13 +16 :a -:g1 +:g2 :train 0 -14 +19 :a -:g1 +:g2 :train 0 -9 +10 :a -:g2 +:g1 :train 0 -11 +3 :a -:g3 +:g1 :train 0 -18 +5 :a :g3 :train 0 -6 +12 :a -:g3 +:g1 :train 0 -10 +13 :a -:g1 +:g3 :train 0 -1 +14 :a -:g3 +:g2 :train 0 -5 +18 :a -:g2 +:g1 :train 0 -19 +11 :a :g2 :train 0 -15 +6 :a -:g3 +:g2 :train 0 -2 +8 :a -:g1 +:g2 :train 0 -17 +0 :a -:g3 +:g2 :train 0 -12 +17 :a -:g2 +:g3 :train 0 -4 +1 :a -:g1 +:g3 :test 0 -0 +2 :a :g1 :test 0 -16 +15 :a -:g3 +:g2 :test 0 -3 +7 :a :g3 :test 0 -4 +1 :a -:g1 +:g3 :train 1 -0 +2 :a :g1 :train 1 -16 +15 :a -:g3 +:g2 :train 1 -3 +7 :a :g3 :train 1 -9 +10 :a -:g2 +:g1 :train 1 -11 +3 :a -:g3 +:g1 :train 1 -18 +5 :a :g3 :train 1 -6 +12 :a -:g3 +:g1 :train 1 -10 +13 :a -:g1 +:g3 :train 1 -1 +14 :a -:g3 +:g2 :train 1 @@ -26973,7 +26972,7 @@

Bootstrap

(tc/split for-splitting :bootstrap)
-

_unnamed, (splitted) [38 5]:

+

_unnamed, (splitted) [34 5]:

@@ -26986,70 +26985,70 @@

Bootstrap

- + - + - + - - - + + + - + - + - + - - + + - + - + - + - + - + - + @@ -27063,79 +27062,79 @@

Bootstrap

- + - - + + - - - - + + + + - + - + - + - + - + - + - + - + - + - + - + - + - + - - + + - + - + @@ -27176,72 +27175,72 @@

Holdout

- - + + - - - + + + - + - + - + - - + + - + - + - + - + - - - + + + - - + + - + - + @@ -27253,28 +27252,28 @@

Holdout

- + - - - + + + - - - + + + - + @@ -27283,49 +27282,49 @@

Holdout

- + - + - + - + - + - - - + + + - - + + - + - + @@ -27351,72 +27350,72 @@

Holdout

- + - + - + - + - + - + - - + + - - - + + + - + - + - + - - - + + + - + - + - + - + @@ -27428,79 +27427,79 @@

Holdout

- + - + - + - + - + - + - - + + - - - + + + - + - + - + - + - - - + + + - + - + - - + + - + - + @@ -27527,72 +27526,72 @@

Holdout

- + - + - + - + - + - + - + - + - - + + - - + + - - + + - - + + - + - + - + - + @@ -27604,78 +27603,78 @@

Holdout

- - - + + + - - - + + + - + - + - + - - + + - + - + - + - + - + - + - - + + - + - - + + @@ -27845,212 +27844,212 @@

Leave One Out

- + - + - + - + - + - + - + - + - - - + + + - + - + - - + + - - - + + + - - + + - + - + - + - - + + - + - + - + - + - - - + + + - + - + - + - + - + - - + + - + - + - + - + - + - - - + + + - + - + - + - + - + - + - + - + - + - + @@ -28087,19 +28086,19 @@

Grouped

- + - + - + - + - +
1312 :a :g1 :train 0
517 :a:g2:g3 :train 0
23:b:g312:a:g1 :train 0
25 :a:g1:g3 :train 0
519 :a :g2 :train 0
3:a21:b :g3 :train 0
111 :a:g3:g2 :train 0
117 :a :g3 :train 0
2120 :b:g1:g3 :train 0
115 :a :g3 :train
618 :a:g3:test:g1:train 0
8:a:g2:test23:b:g3:train 0
100 :a:g1:g2 :test 0
142 :a :g1 :test 0
153 :a:g3:g1 :test 0
164 :a:g3:g2 :test 0
176 :a:g3:g2 :test 0
188 :a:g3:g2 :test 0
1913 :a:g2:g3 :test 0
22:b14:a :g2 :test 0
2422 :b:g2:g3 :test 0
1:a22:b :g3 :train 0
21:b:g114:a:g2 :train 0
1719 :a:g3:g2 :train 0
198 :a :g2 :train 0
20:b6:a :g2 :train 0
1118 :a:g3:g1 :test 0
711 :a:g1:g2 :test 0
14:a:g124:b:g3 :test 0
22:b15:a :g2 :test 0
82 :a:g2:g1 :test 0
613 :a :g3 :test 0
9:a:g223:b:g3 :test 0
24:b:g210:a:g1 :test 0
109 :a :g1 :test
12 :a:g2:g1 :test 0
50 :a :g2 :test 0
35 :a :g3 :test 0
47 :a:g1:g3 :test 0
13:a:g120:b:g3 :test 0
16:a21:b :g3 :test 0
21 :a:g1:g3 :test 0
1611 :a:g3:g2 :train 0
89 :a:g2:g1 :train 0
1716 :a:g3:g2 :test 0
18:a23:b :g3 :test 0
22:b:g23:a:g1 :test 0
510 :a:g2:g1 :test 0
17 :a :g3 :test 0
23:b:g315:a:g2 :split-2 0
141 :a:g1:g3 :split-2 0
413 :a:g1:g3 :split-2 0
74 :a:g1:g2 :split-3 0
105 :a:g1:g3 :split-3 0
28 :a:g1:g2 :split-3 0
6:a24:b :g3 :split-4 0
9:a:g221:b:g3 :split-4 0
1519 :a:g3:g2 :split-4 0
1917 :a:g2:g3 :split-4 0
20:b:g212:a:g1 :split-4 0
1314 :a:g1:g2 :split-4 0
21:b18:a :g1 :split-4 0
2420 :b:g2:g3 :split-4 0
163 :a:g3:g1 small 0
215 :a:g1:g2 small 0
019 :a:g1:g2 small 0
1316 :a:g1:g2 small 0
20:b11:a :g2 small 0
23:b13:a :g3 smaller 0
1:a22:b :g3 smaller 0
21:b2:a :g1 smaller 0
91 :a:g2:g3 big 0
1712 :a:g3:g1 big 0
7:a:g121:b:g3 big 0
4:a:g124:b:g3 big 0
1417 :a:g1:g3 big 0
198 :a :g2 big 0
24:b6:a :g2 big 0
1014 :a:g1:g2 big 0
318 :a:g3:g1 big 0
189 :a:g3:g1 the rest 0
22:b0:a :g2 the rest 0
124 :a :g2 the rest 0
6:a23:b :g3 the rest 0
314 :a:g3:g2 :train 0
1210 :a:g2:g1 :train 0
139 :a :g1 :train 0
815 :a :g2 :train 0
1711 :a:g3:g2 :train 0
22:b:g212:a:g1 :train 0
195 :a:g2:g3 :train 0
18:a20:b :g3 :train 0
23:b:g319:a:g2 :train 0
24:b6:a :g2 :train 0
142 :a :g1 :train 0
116 :a:g3:g2 :train 0
16:a22:b :g3 :train 0
48 :a:g1:g2 :train 0
913 :a:g2:g3 :train 0
21:b:g117:a:g3 :train 0
50 :a :g2 :train 0
154 :a:g3:g2 :train 0
2023 :b:g2:g3 :train 0
11:a21:b :g3 :train 0
63 :a:g3:g1 :train 0
101 :a:g1:g3 :train 0
718 :a :g1 :train 0
2:a:g124:b:g3 :train 0
07 :a:g1:g3 :test 0
07 :a:g1:g3 :train 1
1210 :a:g2:g1 :train 1
139 :a :g1 :train 1
815 :a :g2 :train 1
1711 :a:g3:g2 :train 1
:g1:g2 0Group: :g1, (splitted) [9 5]:Group: :g2, (splitted) [9 5]:
:g3 1Group: :g3, (splitted) [10 5]:Group: :g3, (splitted) [12 5]:
:g2:g1 2Group: :g2, (splitted) [9 5]:Group: :g1, (splitted) [8 5]:
@@ -28138,104 +28137,104 @@

Split as a sequence -13 +1 :a -:g1 +:g3 -15 +19 :a -:g3 +:g2 -17 +16 :a -:g3 +:g2 -11 +13 :a :g3 -0 +15 :a -:g1 +:g2 -19 +9 :a -:g2 +:g1 -10 +11 :a -:g1 +:g2 -9 +2 :a -:g2 +:g1 -7 +10 :a :g1 -6 +7 :a :g3 -3 +4 :a -:g3 +:g2 -4 +0 :a -:g1 +:g2 -12 +6 :a :g2 -5 +14 :a :g2 -14 +18 :a :g1 -16 +8 :a -:g3 +:g2 -23 +22 :b :g3 -21 +23 :b -:g1 +:g3 -22 +24 :b -:g2 +:g3 20 :b -:g2 +:g3 @@ -28267,29 +28266,29 @@

Split as a sequence -2 +17 :a -:g1 +:g3 -18 +12 :a -:g3 +:g1 -1 +5 :a :g3 -8 +3 :a -:g2 +:g1 -24 +21 :b -:g2 +:g3 @@ -28306,7 +28305,7 @@

Split as a sequence (tc/split->seq :bootstrap {:partition-selector :partition :seed 11 :ratio 0.8 :repeats 2}) (first))

-

[

:g1
+

[

:g2
 

(

{

@@ -28320,7 +28319,7 @@

Split as a sequence

- + - + - + - +
-

Group: 0 [7 3]:

+

Group: 0 [8 3]:

@@ -28331,39 +28330,44 @@

Split as a sequence

- + - + - + - + - + - + - + - + - + - + - + - - - + + + + + + + +
1415 :a:g1:g2
0 :a:g1:g2
411 :a:g1:g2
1419 :a:g1:g2
74 :a:g1:g2
138 :a:g1:g2
21:b:g116:a:g2
14:a:g2
@@ -28384,7 +28388,7 @@

Split as a sequence

-

Group: 0 [2 3]:

+

Group: 0 [1 3]:

@@ -28395,14 +28399,9 @@

Split as a sequence

- - - - - - + - +
2:a:g1
106 :a:g1:g2
@@ -28424,7 +28423,7 @@

Split as a sequence

-

Group: 1 [7 3]:

+

Group: 1 [8 3]:

@@ -28435,39 +28434,44 @@

Split as a sequence

- + - + - + - + - + - + - + - + - + - + - + - + - - - + + + + + + + +
08 :a:g1:g2
1419 :a:g1:g2
1314 :a:g1:g2
70 :a:g1:g2
46 :a:g1:g2
1419 :a:g1:g2
21:b:g14:a:g2
15:a:g2
@@ -28499,14 +28503,14 @@

Split as a sequence

211 :a:g1:g2
1016 :a:g1:g2
@@ -33144,19 +33148,19 @@
1 -7 -0.5 -A +5 +1.0 +B 2 -8 +2 1.0 B -2 -6 +1 +9 1.5 C @@ -33182,31 +33186,31 @@
1 -7 -0.5 -A - - -1 3 1.5 C + +2 +8 +1.0 +B + 1 -9 -1.5 -C +5 +1.0 +B 1 -9 -1.5 -C +7 +0.5 +A -2 -6 +1 +3 1.5 C diff --git a/notebooks/index.clj b/notebooks/index.clj index f4e2cdd..fc05749 100644 --- a/notebooks/index.clj +++ b/notebooks/index.clj @@ -27,8 +27,6 @@ tablecloth-version ## Introduction -.... - [tech.ml.dataset](https://github.com/techascent/tech.ml.dataset) is a great and fast library which brings columnar dataset to the Clojure. Chris Nuernberger has been working on this library for last year as a part of bigger `tech.ml` stack. I've started to test the library and help to fix uncovered bugs. My main goal was to compare functionalities with the other standards from other platforms. I focused on R solutions: [dplyr](https://dplyr.tidyverse.org/), [tidyr](https://tidyr.tidyverse.org/) and [data.table](https://rdatatable.gitlab.io/data.table/).