-
Notifications
You must be signed in to change notification settings - Fork 0
/
search.xml
7967 lines (7942 loc) · 863 KB
/
search.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>二分查找</title>
<url>/2021/04/29/%E4%BA%8C%E5%88%86%E6%9F%A5%E6%89%BE/</url>
<content><![CDATA[<p> 二分查找是最基本算法,它的复杂度是 <span class="math inline">\(O (log_{2} n)\)</span>。</p>
<a id="more"></a>
<h1 id="基本思想">基本思想</h1>
<p> 二分查找的基本问题是:给定一个 n 个元素有序的(升序)整型数组 nums 和一个目标值 target ,写一个函数搜索 nums 中的 target,如果目标值存在返回下标,否则返回 -1。</p>
<p> 二分查找一般由三个主要部分组成:</p>
<ul>
<li>*** 预处理 *** —— 如果集合未排序,则进行排序。</li>
<li>*** 二分查找 *** —— 使用循环或递归在每次比较后将查找空间划分为两半。</li>
<li>*** 后处理 *** —— 在剩余空间中确定可行的候选者。</li>
</ul>
<h1 id="模板-1">模板 1</h1>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">binarySearch</span><span class="params">(nums, target)</span>:</span></span><br><span class="line"> <span class="string">"""</span></span><br><span class="line"><span class="string"> :type nums: List [int]</span></span><br><span class="line"><span class="string"> :type target: int</span></span><br><span class="line"><span class="string"> :rtype: int</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"> <span class="keyword">if</span> len (nums) == <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">return</span> <span class="number">-1</span></span><br><span class="line"> left, right = <span class="number">0</span>, len (nums) - <span class="number">1</span></span><br><span class="line"> <span class="comment"># End Condition: left > right</span></span><br><span class="line"> <span class="keyword">while</span> left <= right:</span><br><span class="line"> mid = (left + right) // <span class="number">2</span></span><br><span class="line"> <span class="keyword">if</span> nums [mid] == target:</span><br><span class="line"> <span class="keyword">return</span> mid</span><br><span class="line"> <span class="keyword">elif</span> nums [mid] < target:</span><br><span class="line"> left = mid + <span class="number">1</span></span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> right = mid - <span class="number">1</span></span><br><span class="line"> <span class="keyword">return</span> <span class="number">-1</span></span><br></pre></td></tr></table></figure>
<p> 实现 <code>int sqrt (int x)</code> 函数: 计算并返回 <em>x</em> 的平方根,其中 <em>x</em> 是非负整数。</p>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
<categories>
<category>算法</category>
</categories>
<tags>
<tag>算法</tag>
</tags>
</entry>
<entry>
<title>数据挖掘面试题</title>
<url>/2021/01/17/%E6%95%B0%E6%8D%AE%E6%8C%96%E6%8E%98%E9%9D%A2%E8%AF%95%E9%A2%98/</url>
<content><![CDATA[<div id="hbe-security">
<div class="hbe-input-container">
<input type="password" class="hbe-form-control" id="pass" placeholder="Please enter the password to read the blog." />
<label for="pass">Please enter the password to read the blog.</label>
<div class="bottom-line"></div>
</div>
</div>
<div id="decryptionError" style="display: none;">Incorrect Password!</div>
<div id="noContentError" style="display: none;">No content to display!</div>
<div id="encrypt-blog" style="display:none">
U2FsdGVkX19DBr+bmM70MVVg0wUmf4csjHNDG2jhvPyc9qBWCZkNpIghxUK4Gyd8Gk3P0QBKIF8W3fqLSAhG39nWjP3Br03zb8tA49b0HDj7c1CIc0owwY3dmGJrJPL9YQnI5axw5ykIcQicu6Xfgehs4LeicK5hKefCpIph5G80a+UsWAyW/R3NddzhwicByycaHfqD519cEHi3U3VDr6xN5o6FHvRrb8yewqCXdSpKsn6hmrlYmaT+fw7RktElBmW0tKiv12GDbiBvQMfKsTCPUxzhLBA5spzNROBItvNMOE/lOuclQwgwD2QWWVFZW0KTIZ330iXpADLXa6zYERxu/1UPC9ugmHugatYVVs7NvfLYkjoK3ERMx25X9G1CWPpaE/3vI+iTymXjvo2Cher0FtvzC3zvDVirrGpvgZeOLc+y9v37UgD1x+sJJsX/7zRCSkTOVkEZMRCwmfIVt2XUae+P6O1KGmIEBYqzcfFcIQb4w4fls6SniyUFKkrQ/4f27X7JPWY1JGW2BwPdFU21lhvpEpSWQYnV9LPRHINcefS8B4l8FPIVtA9ZNp44YFfjufQp4ncrcwK+ZxhS0nFCo8ruD1q9wrkVdPt0Pg12FwMu95lMpPa+bDFDdcuX8ZYWQ5/hOauQNxuxpJ3ZIoPyVJjL2Im72swTAlhzwCD/n9lxDMnhQDQUKEyN9QDSQzP/l43PQQ32LAaHjrnnFgy4cMSzrklphLmP0rbzXMFEOsioeGxdw0d88Zoctjlf0u+Y3qNOaSORArVWhMGMsasoJJgXFbQ/OV9h56tR0ICER9ZeTZ2Mw9pmi0BOgxoNW/yPG8xFB0ocT/Im4OgMd8V7HTTjxdLLflfbKHQnvQCvgVfnXSo40tgoAKJ5O8ztEBCT5QJFrT6KVN52/gBRlzQHpUSi775ZRA4Am/ndVzwElYnqAwV5fDHTPPMmX587hqieb9vJxFT6LttRKniJMoQVbbTfJVHPgDFw/h0BErPSZS9pgbLl9dH648SONrhY7Wj0fJ9ZnO2QcSB7zcMVynxiTRRAYL7dWoJrgwjvJXA2O40F1n/u+0hyupQFf+hbNcz63f1t7JJMz8rzsPgO5kpHNnxXy7PiVsYneeMP/Izg+cMVl2bNqKZq/CdxwAO8c7I/UqJ4qizwcJv1ICxrSFYJpA86QWpERDCG4Za+gfr/UmMQ9glpWFC2VrCYfVcIPwkS7Pdw0KNSigHT8dz+qTlgsx5+xVS7dUE0JmCbGIg7mCA/WEH0itMolnPq1kfHBrC7bhas8AP6/xZSxPT28pEKuFW09QspGsmagnF5EpiUbb5ABLLWu/Z2fOxaQxZyf/zzoqJNZKN060LNRoqGCZLY5kvxgvBEB1nx2d+qUBKexAsqfVNXnW7GaMT7BkbRbfyOeN7GPSUXUBz9f+aGqPzs+xmH8Y0o/BmoAhTLihjZypC/DRyZWP1yJkHBiYX0CoiDi6l+WUsF3PM+c7welQqoZFGuSAM3RR/XftuM2rLxqouuU8M8ZpPEuc1EhI4VEYEU3rYrvzBdjiZP99V5nDSP4FG1/xnsWJEd3Q0+mwB95po31jB3yZIU+fJyf6+oaBVtrz4NeeiyZYTEBFCADiSBXgzERjPjLz+PLqq+DRtOaaKhDYrXfvr1Vc6dPYi/T5LgzuVKzGhX6u0wOMp2kZU73xi48osFieVPeX6r66gnLcxzWJoDr9KOl2tfs5FaBVOo6xBlrrqUpf/druV+pE3n32p3MYsMhikA05w/yH26q0dA/2qvZl56fW46TSp1G/xKoynl2gbNN4g2JbK3g2b5AWdk4yC8c7CUHXLbpOvUh/R7+DIeAX8gGIrH+0h4LmuSGOX0kwHSLgErTZSb5dSyGYn3x1PnUHmrdjI0vDamtMqCp8K0QeAyvEX9vkQuRMiVR/tQT8HiIMmULbldG4o5oPrv4VFlF3gRL6Iu5XWtPXnHLUVCBOQJJhR7SCbaPbKQXBRpGHhtZ62eEn1iVSS6o4NQ8OoPBt2zZ2HxO/BaIm+6U0j/MB2Je6y7Q11Sug1AL3uQCp8c8hot++97MgitGOgn2Su0MCDcQLrwj4URvqkPnY5178xamB+HPElp7nK4U9VMFAmk2IreMrfGA2cO1xfN/erY5H+WppYx3ekPo/QCk/HYVia8f6Q1jKXylik3dgHwE6qemq+Rwrsv8I4VsQ5cyeg/1PrhR+jQAC9sg1BZ71h7youduJLlSlTCoz6RvCSwvVmccylUYth/eeLDtfjle68q5kSWXqXh5HcSSc/q8QMw+WBwILhzDJyBCxD8sm3YFyyEr2Amfdx3+2Zt+tdlBG0eCineJEBLL04ZKpWAhm71Tp0TjeWr1CpN8J0rndNpYNcJShSw1d/QZ6Pa47QQyuFYOfF4OHQCzdnE9e2sCtF9fvalPVUrrhXl2CQeVs+EG6aajbreuKawk3sjizB9gY00pd/0ri/ci/Lt8mcgw5uQLwiJxrcFXngMKN80L2vmkU2g4H6RTguAP9ZDzn9ubcG13lYUFOeqZEhbRUJ53b3dw1/KrTd0LxAHJLBLBtvQOXd6/dLxSLxWAx3e/Bh9GSt9kvPX2VUDpcMPoizclOKdqcm3L6AVqzZRngNSBF2lAP4zvCAj5KgyLluMa475gPKFeYQf1BwEMt1JeTiQuhukb0RFdTinbOjyj/PwdbFUeGiD1pBUoj8pkXh57k37pg80PyZvN+jFL63Icth8h6eVZU+jT0kVmR5i9239KxoVXLB50fMykjD0wip6qIjkvXOhzTKqW1IbjPMCUvAzqOkEAuvIMy8dW6jSuwJneX86vONWlDXEG6nb2CizzW15Yk7eli/yDN7HHlB9iwDmAqjmg/JllJdnbiPClTGVofZz0gPkelNJanW8G+JqIXC0k+6KzBrZxAML8l/RhHTlP+rh3M8WT7khCycq0UHrkWT67IMSo/J6Rcm6RLwONKvP11vf+pKx7QJH9pn6St7DWYpJ9aWbZ+YFk/orExxhp8vRT0sK7/zIaonBMwfK1qWemaL557FY1/R7pknP6W1EpFQxTWc+dig4S6rwkm6FOJUh/7/WMSTIM3qSiLrizDyVLPZIanp9WhTP/k5jA6ffoa232cHXQ9Kc/UEGE4Uvj8sQYQ5isAtirYNq08WjXRrl8CXk8DjPA0/KiO86o85scK4ac6M4lVp6PULn8AkfXZ78CRGLYohpxpiIZSam+WXxGIG7VvWd80l+yZ+cJBqTUE85p85Jq3lD/6S9J3EIibWt7ryFV3hjgQjuwmU6/NRM/WmyJxZfR6K3WyfylxoJFYFL5n+f/6vmVnMS2VOV1wVlLqm3PgGSqNfhbLi2VFj4rb9UAyc8Wh7JoXWLzj9z9vkZh6svJxV3dAtoNjv3ZakrCC8BEhdBy6tvRGq8ADjYLUJtda770ukSgdtC1b4Dq9ya0sCWf5dVtoo67uDusXRqAwTE2iUg6fm77SyfesH2yplPTmZI8bRW09A2eHeGG2rbR6eyvkLUfjvOKhujvYvZQ4HqWdPGXGC92FrFIfeMOomc6Yd5QCNBKOw3fea9OjLo4cASitIVEas2n0tgTqYYlZwTTtiS/Z8Rrcc4kB/cpiCd65ur8vILIp+boDPw3YWbFh+uIXMwOwV2C1aEN2WUnkVs5ab8Vc69lGnlgi85oQ0x1SuZlfayYVuWV62hDSC7XGi6diMnPib2+uryos/8jufj7X+PR9+WvSIPpQXH1uBfqMBzxxL6zScUdHxmt2VYEFoHE7E5gX4wuFPI22W4+D3n7gyUBybw4pmRReSWDCHEpbL9HisP90Gy5f7+Ei+TgCFytqnbt4rsh8h6OCizRBjTMifjj1GOamgM5jLJcRrScq0e1K9PeBe+vvQo6l7kYxvyhTlWHgJDwFMERrgdrJBIary/9ISTdB6z+B49cjjbEKICgFh0N/USl4HaSK2O7bRanWDuIfbiHkcWjYQVwHsHgrBcIyDvgInGITwhTHJ4GnDBPdyEyhZzVr527aJwZCz8xZPUv6h0ma3C93Rc02AVgQNtK3zag0TvojQJJcl/rkN+WqNQmpyPXsr59gZcOsebzoIUczP66DqSY2NvcxK/pzoTe9htbxNnli3U7dT+0k6ROYXeWqKJGBbWCmbE4DH69agS71QK+zXRGnJEFpxLUP516TneBm5Bf5xuJM3XDrl4q8f3uTE7Gm98W81eBrXh+gJJD/7E0wuGNXpryXnC64WSHlwT2jnS8bGrfyCg4YdF6e0fN+Gn1jkwp2o+ODNt4GykrzbSi42zz/4th0DabDxCbpJkPTO476438nRl7hbLKIMfb/9oSkswSgPW79pI98jzkXN1szXGH7yAQG9dAnuDQeDPoDRJ4npCdyp13Wn7qUKMboFVv+kVbra6iH/S3BP5HvRc8bzSjRV1KfSYE799NUdQfOvryDpyEUwxvxLqR2Ypc9ixpZTaQEm3Pxj3juRFoViLZxcDnPsloK1arPUiOP/nUkzBAGD/sAo+ISPHnKSd4DvPpOtlhvvqLc/nPg9/0pwYr/US5HgV2MFLIZtC1K03XNUZI7o53LI4Ql1+99xGBo+hU8qH1HObLz/Ld2AQhKhH3F8dlULuacLRy7tzs33AMlIZKQv7Vn2s41g7kOIZXksd7gG+uMAeS4q9vcY8ORbxwu4rerS4E8Pwg+a3aPGxh3YE8gufeFd+Jstpep5Qf2GtHXldko9h6T6P4em+Ziqm0UiaQJYuE4rseERF0BHaGqBGs55tZIlBLobK55HjJ8EmxqAabR3PS2fzfozv9jEhEcMyzBjhyVSudil+Bn7b1TqOh6aEWIWYpOhElqyDedinfx7qfF4Xbs5Az6L5N4Ua8j1wVL1c/JGGEGuRjTkPPnt7bnTWd6Ania8ARkXD2pepDbly213EGbHwrh7L7jaqvzo+DqEhcGCFkdMiyk8ujuhBFSt04EgzTOCvmh4AQfrBdKDWxHSMMhO8+7kakmzAZ66/FKFmbE+h5FIWEkINJlm78D7TTbGMAXXR39HNj2FtPV/Ak/C5fjJ/RHobL2uX0dN45fcU+dZkLllkYKDoYUgINpkSLHLTUEJNtabWI8MDyk4W6chhGaMUQPIe+KhSdAGUyMoGTY1BUbYSkCJQciyZf+L2isjZ5Yn363BgXSsC7BU7pVpuMjaJgXMGeOLG4a7vCHlPm4S673Q0CkU4vaStOZgYuthQpaw/VK9+T2VDnjQuTb95+FWg6BiRaRoueTjGzRwVaWIPm7ncfkvNBITDkaCHTfU7keXJkNF1bE5sOaDfidyfZO6zKmkgmyHMW8tsatMpRcukP3lal/wRjQAzxaUN2+3BsXQkvSjbX51pzb/VyESVnoMYchqSQQdv2XPO17RQsIvIl3NPLdxp8/rwrbYGTQTykblQ4QGXDUFXNmwFCpQfnFDGZWeNF0DSQB8TBZK0x/hW6hn70OoR4eXa0W4FWoMc8SRzwKyq9/DROYK3U4yLvzIxLFe9dREW9xdkVofmFUzPObuhpV03jhPPzy31dCzkq+orF39YzVsZ8siyIuuw2Dwt3sCThrLeNxmgC1mYdunjBNiAyl63GJwKCpO7ACHHOFrcq3n1gQ9Dc/hZG2CUKd7htqnU5dakSxsHrrY2K5aN1l+vimOIUyvFv68R9WiH6cxfCFS2KP53L2an/uV4ufY4seipdVAH9VN92Oc+Tk0/fN8+nDnBhAqpzc76ZVYdYIwtqQzdZaiSbKDFUn4QBm5KegNKn1Q7mkv1yonzqOh/pc1+5CvJ+Hyjy8s+jHRotKtpeFobMtB2jD49BWhS7okKmg4j82CkVLkE+D/4LWQ0RYeCILyeVl9+c8A4f6W5KKnzw8m19vBhmM0+j3afVRL8XQe0+4j+iC3GJahHThA4VsHZHkg/P5aobre40nsedtlRMeLPUXCpRYWWL/iKeDFEPBxR4idoYTkKUAp0iJLUlClPoL3vXC/NUM9VXQb5AWWJQTB8AwgvLjjKXRx3Il1cTZiDaPNI2mKttms8vPLCVX0Ls7t3P8nXwblYm1QV22bO0PTyQXCNaG181BpH3N2szmc1vN1IVA6/TM38Oc4RQm/43CtWIzAQDepCXTQwIFaemQOxQmR1UjVztLjDFnrKx6w2NreU3VEsK+4pc7eyqDLyMmbwlyFTAPy62gT2HFkBXLMO3PVc/x7/00MW5hXIDDTpslgFlbg5YLl3jfI722o9/l8UWJHj3gms9sw80zF82j0xDNZilcRDn0ttftNNMQZGKa/n7Gb7dj0HKhT2kS3MTVJNUO4CkSiPaPMpP9CuYENyx5yGUFpjmxvTDXORi+DAGQMQOfao+j2m9oFN6/dC1Z6fdYcBlDy50qpI6PwTmtafI6Gk/qZ/bPVmgRDpXeG3TdyPV0B8Z4XTseOdrbCTRmhVJP1EAsqTebi/l++7jJtREgBtlbX2/5/vLe2jkoiMpnoRT3zFQvn5260VH0wQLNZf9x/xTtmP8fXDwUO2hKXdo3JXf17LxmnKJ7gtc9rnqAgHHeEBlXLqkvKC9eCPDVY8U/3pleGh8JxqcwkGB89FKYaQCbZ9BzhDjxB41lot1ryumwrV0M4PLKjazkxeFAy2L4bPinU+tOZyncPd5mjUtchMJ0w6GmI0g305lHv3BM36g0gPQlyXD5hDtKuMZKtRyFrizFZOJdbyzkSIZC9Eebf7WWtmZLZbSRO6UPlke13Z/sWQyApWOZYJhQNTx3r5fBKvwPrP7QwWhG177jZo0d6HsadBjylAoxE7f13Kvx3k+jL39DeWchaN5Pp/cvQeoETvBPYxPPW7LcRpzjQUJhYahLsS6y1sxTFjWDWCkKhzP7UIVuFdTT43uzeTkAR2imqvyNbu/gEiS6QkLIh2gHZLccCxupgatAjHNNWTqmRC/g4ppRFyXXXdFSVTSklYPrjQSBG+WG7ClOhz75SspXH4FlIcRUCrcCuYs2PizhDMM7gkXDdywCrK9u3NaCp8Id0hbi/l4YXDKeVk37Lnm8Jxn10nNeMgHexHM+gdDa7zonM4TJl14QllMTbkRz1m6kizHffTZBgwXvETTvc3rFvfacGSYqsh2Sb5hZsL0tG7eagcxg0dmQ0jAKikErOZg8FIj4ZTAdLUy2Dzbz3x1p1GqtO/6hMikv8nKElKRtoz6uJUQumfKC17apE+POL8i/lf1e3itinN+nhqRNUruAFayBpKqoV636YWJJ8xh5ZnDC9v0ohk9AEKj1G/GbroDtCMvHoBrK8oJ9JTYchP8Cma4dZCkdFY0m3rIkWh0AeVcDHTAZ41jdxNTXwfvN0hofEHFaApKWqCoxi4/P49GjYm4ph8Sxmf+gDbMYJpsJO6EUiuU8D9NceNXkgssgM6WaMy4q4ovHXJ2sEfPwOxYKImSekWtiIVyY1eCPyrXpEuDJNvM94EKSlHuMYNM7mqkvgbVIaAgH6krlqqM4CExnnzJb+LWdn0+n2AMlqVZBby3ENYgUxGj7AGLslHj7Hjgirt+v2S1w15Y6/GuUWrIJaXH0GK/hhKefTTNy0IaSjiEUFzVBVSU67gXQ8K1bUspN+ezvWbhAa6jG5V1OpxjnFt+Pc8pCWKbynvHxSkKS7IL65PUurNywiVD3Ozq2xoLdNKlWXcb9wAHmGkG3OwcQFsj6t5/ydvZ0dmqL0UKDLO4h9/FKYzmp87FXV3hqKc4epUPPvP6pZeI1GKJGtVt9Bkx6ZM0jgW61nDVv5oEpqb5Yx0kxSvw1b9w26yMIzjW5NZpoYuHSNgrssKyv3zQFls16kwXWkZ6yc7cbGcC62pQg/H+njK6cMOvn66kryUbsoTPXb3khjkk45dy5lWzvd/TQwc48wD0fVDBj58cokQ7icM2X8UxWg/ypMEVwdtZ6fQKguyiyFkmwDUo/e0Z+2C/41hHVZXzO5ibv4YbwETqnnq+HuJi66iSR8xQ/I7VGRn6FEdgeo3exJT9k6j5lS3j0uF/lXb+9sM18stUCw+I/7ibWkwQRjf6Kixek2J3w4e0SRSScfArJY4LbZkS62Ty1S93cwHl/SjpLrFlUvif4QKS8YISLHrUk1cRdDEEdHa/ol85qZnYfgAMOuz3bGCV+ar7N4La4ftu09l8tlvwq7809moIhr2ApWmkBz8s9VftJU4w1nuyz/8a+YsznY+iEqijKOAAicAsXcvMDuBkM1vwNyLXUH5CslM/zqJOB+ioJqNoyLHWC10N0K6xGwGzK6/FDYnUkYAqNLfXfaGwrnw3wtUBqK599R0r67kuw+q57TQU0lvaRmY2ys+Im0ObQ9iAG085HAQyVh4c2XWC6k3rFJZkStMcnh78KzDWMk9AzDXvGuWmW4GKv+kvLTIiEDm50noFgsB4Q3Xy6kvoGnjB3Kn0Uj2f/BR1ah+jAJwxD/Oi1hmvkEkWfpC2jQSdy8Hbwu2gSi83qI8GI0HQXqFESFTH3L1MO880OpZHLjcINgfORIbIC5IweNMNdBmY+sSEbr9lo54rWUeq+lSDNTnVfFRVRWpB4Ue3+Uyfyn1HfHYTkuaOiXNZkxqAV4csRCaY2vM8tZpnO4U7frYPq3NZQXbROQng5nSppmXCc+YNdCCyo6J2Tt0LwH2rnm5hZN6YFy/+nwpkK6v3Xg/GbjhhTz3kBBxXKqBln9i7pP6qNbPlx9cvBj43P1Gm6+4JT/h1M5CCR8A20/CM/+OnDmsA575HDEUvQuUSgKiGiYs2rMovaTit679p57t/SEA9Jmo5nTzEHXXhuB3G5gaBM1K4p9OGZxY7KPqelvzWNdHUhgyTJdN2SAGXf4TNnun4ZnpmTaAXqtIf6BHWf64s5sdKxaxQq7ktKOk/NMivRLFfvOoPS9R9BK+vtbhRZ23ybHQ5zdp9+3j8UYxDbKAIns0VNQZIDyirHhlRCiizBY0bn6Nf2P4euzZf5pMpg28Mc+Lxg7y+/nAbbaJlxoFoNGBjt4UVwXK8ygCa14k3gqZseO/yhXdePOo4rcDiu/saQ6/6d3aFbAKfDw7xQUxZUuidEEoEtqg37mjUEnuMUMdYYXPfHzL3fjRuIcfLMvKX3K4AgiY1QxPlEJYhXSE8ZL0pbEX3tpq/PXjuQelvqLekuC7i5Bn44oVKX0ig8yqLKbzTzzLOICVD9J2OIW4hQW7KDGyhVvRPUsHHnMA+/duNe7+Q08e8vN8iEWGokLKoQyz9PxHr/XHjhqVLGDxwy/M9IyuHEOTfyqcwmOjjcr4WIAXumRdUF7pA8uFlCvUfoY/y3T4w6yJJAHa+jJhEPwfLH9zHI63g2hN73xvRZaG6BZgw48VIXQPtmXhdMoI+f//1oiNPELsO4B5k8anvIyXPem13Kd0/FNudwNh3Gmpb/BIie6fFlOk3Jlw6wPFZNywO9b85vOSD4FYqUkiXMlG0z4kJMfRiDPk6+Fbz2ItyBzfMCyJxunSqfUdLOZ9cWTkboLsj59ta3N0WCYi
</div>
<script src="/lib/crypto-js.js"></script><script src="/lib/blog-encrypt.js"></script><link href="/css/blog-encrypt.css" rel="stylesheet" type="text/css"><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
</entry>
<entry>
<title>面试题:你的平均薪水是多少?</title>
<url>/2021/01/12/%E9%9D%A2%E8%AF%95%E9%A2%98%EF%BC%9A%E4%BD%A0%E7%9A%84%E5%B9%B3%E5%9D%87%E8%96%AA%E6%B0%B4%E6%98%AF%E5%A4%9A%E5%B0%91%EF%BC%9F/</url>
<content><![CDATA[<p> 面试题:查询出每个部门出去最高和最低薪水后的平均薪水,并保留整数。</p>
<a id="more"></a>
<table>
<thead>
<tr class="header">
<th style="text-align: center;">雇员编号</th>
<th style="text-align: center;">部门编号</th>
<th style="text-align: center;">薪水</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">10001</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">60117</td>
</tr>
<tr class="even">
<td style="text-align: center;">10002</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">92102</td>
</tr>
<tr class="odd">
<td style="text-align: center;">10003</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">86074</td>
</tr>
<tr class="even">
<td style="text-align: center;">10004</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">66596</td>
</tr>
<tr class="odd">
<td style="text-align: center;">10005</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">66961</td>
</tr>
<tr class="even">
<td style="text-align: center;">10006</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">81046</td>
</tr>
<tr class="odd">
<td style="text-align: center;">10007</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">94333</td>
</tr>
<tr class="even">
<td style="text-align: center;">10008</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">75286</td>
</tr>
<tr class="odd">
<td style="text-align: center;">10009</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">85994</td>
</tr>
<tr class="even">
<td style="text-align: center;">10010</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">76884</td>
</tr>
</tbody>
</table>
<p> 解题思路:分部门查询最高薪水和最低薪水,然后去掉。</p>
<h1 id="查询最高薪水和最低薪水">查询最高薪水和最低薪水</h1>
<p> 查询最高和最低薪水可以利用窗口函数 rank ():</p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">SELECT</span></span><br><span class="line"> *,</span><br><span class="line"> <span class="keyword">rank</span> () <span class="keyword">over</span> ( <span class="keyword">PARTITION</span> <span class="keyword">BY</span> department_id <span class="keyword">ORDER</span> <span class="keyword">BY</span> salary ) rank_1,</span><br><span class="line"> <span class="keyword">rank</span> () <span class="keyword">over</span> ( <span class="keyword">PARTITION</span> <span class="keyword">BY</span> department_id <span class="keyword">ORDER</span> <span class="keyword">BY</span> salary <span class="keyword">DESC</span> ) rank_2 </span><br><span class="line"><span class="keyword">FROM</span></span><br><span class="line"> data_test.salary_table</span><br></pre></td></tr></table></figure>
<table>
<thead>
<tr class="header">
<th style="text-align: center;">employee_id</th>
<th style="text-align: center;">department_id</th>
<th style="text-align: center;">salary</th>
<th style="text-align: center;">rank_1</th>
<th style="text-align: center;">rank_2</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">10010</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">76884</td>
<td style="text-align: center;">5</td>
<td style="text-align: center;">1</td>
</tr>
<tr class="even">
<td style="text-align: center;">10008</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">75286</td>
<td style="text-align: center;">4</td>
<td style="text-align: center;">2</td>
</tr>
<tr class="odd">
<td style="text-align: center;">10005</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">66961</td>
<td style="text-align: center;">3</td>
<td style="text-align: center;">3</td>
</tr>
<tr class="even">
<td style="text-align: center;">10004</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">66596</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">4</td>
</tr>
<tr class="odd">
<td style="text-align: center;">10001</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">60117</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">5</td>
</tr>
<tr class="even">
<td style="text-align: center;">10007</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">94333</td>
<td style="text-align: center;">5</td>
<td style="text-align: center;">1</td>
</tr>
<tr class="odd">
<td style="text-align: center;">10002</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">92102</td>
<td style="text-align: center;">4</td>
<td style="text-align: center;">2</td>
</tr>
<tr class="even">
<td style="text-align: center;">10003</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">86074</td>
<td style="text-align: center;">3</td>
<td style="text-align: center;">3</td>
</tr>
<tr class="odd">
<td style="text-align: center;">10009</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">85994</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">4</td>
</tr>
<tr class="even">
<td style="text-align: center;">10006</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">81046</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">5</td>
</tr>
</tbody>
</table>
<h1 id="去除最高薪水和最低薪水">去除最高薪水和最低薪水</h1>
<p> 利用子查询 + where 语句:</p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">SELECT</span></span><br><span class="line"> department_id,</span><br><span class="line"> <span class="keyword">AVG</span>(salary) avg_salary</span><br><span class="line"><span class="keyword">FROM</span></span><br><span class="line"> (<span class="keyword">SELECT</span></span><br><span class="line"> *,</span><br><span class="line"> <span class="keyword">rank</span> () <span class="keyword">over</span> ( <span class="keyword">PARTITION</span> <span class="keyword">BY</span> department_id <span class="keyword">ORDER</span> <span class="keyword">BY</span> salary ) rank_1,</span><br><span class="line"> <span class="keyword">rank</span> () <span class="keyword">over</span> ( <span class="keyword">PARTITION</span> <span class="keyword">BY</span> department_id <span class="keyword">ORDER</span> <span class="keyword">BY</span> salary <span class="keyword">DESC</span> ) rank_2 </span><br><span class="line"> <span class="keyword">FROM</span></span><br><span class="line"> data_test.salary_table) t</span><br><span class="line"><span class="keyword">WHERE</span></span><br><span class="line"> t.rank_1 > <span class="number">1</span> <span class="keyword">and</span> t.rank_2 > <span class="number">1</span></span><br><span class="line"><span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line"> department_id</span><br></pre></td></tr></table></figure>
<table>
<thead>
<tr class="header">
<th style="text-align: center;">department_id</th>
<th style="text-align: center;">avg_salary</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">1</td>
<td style="text-align: center;">69614.3333</td>
</tr>
<tr class="even">
<td style="text-align: center;">2</td>
<td style="text-align: center;">88056.6667</td>
</tr>
</tbody>
</table>
<p> 最后需要对结果进行格式化处理:format 函数</p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">SELECT</span></span><br><span class="line"> department_id,</span><br><span class="line"> <span class="keyword">FORMAT</span>(<span class="keyword">AVG</span>(salary), <span class="number">0</span>) avg_salary</span><br><span class="line"><span class="keyword">FROM</span></span><br><span class="line"> (<span class="keyword">SELECT</span></span><br><span class="line"> *,</span><br><span class="line"> <span class="keyword">rank</span> () <span class="keyword">over</span> ( <span class="keyword">PARTITION</span> <span class="keyword">BY</span> department_id <span class="keyword">ORDER</span> <span class="keyword">BY</span> salary ) rank_1,</span><br><span class="line"> <span class="keyword">rank</span> () <span class="keyword">over</span> ( <span class="keyword">PARTITION</span> <span class="keyword">BY</span> department_id <span class="keyword">ORDER</span> <span class="keyword">BY</span> salary <span class="keyword">DESC</span> ) rank_2 </span><br><span class="line"> <span class="keyword">FROM</span></span><br><span class="line"> data_test.salary_table) t</span><br><span class="line"><span class="keyword">WHERE</span></span><br><span class="line"> t.rank_1 > <span class="number">1</span> <span class="keyword">and</span> t.rank_2 > <span class="number">1</span></span><br><span class="line"><span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line"> department_id</span><br></pre></td></tr></table></figure>
<table>
<thead>
<tr class="header">
<th style="text-align: center;">department_id</th>
<th style="text-align: center;">avg_salary</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">1</td>
<td style="text-align: center;">69614</td>
</tr>
<tr class="even">
<td style="text-align: center;">2</td>
<td style="text-align: center;">88057</td>
</tr>
</tbody>
</table>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
<categories>
<category>sql面试题</category>
</categories>
<tags>
<tag>面试题</tag>
<tag>sql</tag>
</tags>
</entry>
<entry>
<title>如何分析交易量增加的原因?</title>
<url>/2021/01/11/%E5%A6%82%E4%BD%95%E5%88%86%E6%9E%90%E4%BA%A4%E6%98%93%E9%87%8F%E5%A2%9E%E5%8A%A0%E7%9A%84%E5%8E%9F%E5%9B%A0%EF%BC%9F/</url>
<content><![CDATA[<p> 面试题:根据下面的业务数据,进行分析并得出观点。</p>
<table>
<thead>
<tr class="header">
<th style="text-align: center;">日期</th>
<th style="text-align: center;">交易量</th>
<th style="text-align: center;">交易笔数</th>
<th style="text-align: center;">客户数</th>
<th style="text-align: center;">新客户数</th>
<th style="text-align: center;">新客户交易笔数</th>
<th style="text-align: center;">新客户交易量</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">2020/4/1</td>
<td style="text-align: center;">594.7</td>
<td style="text-align: center;">16.8</td>
<td style="text-align: center;">13.5</td>
<td style="text-align: center;">1.9</td>
<td style="text-align: center;">2.2</td>
<td style="text-align: center;">65.9</td>
</tr>
<tr class="even">
<td style="text-align: center;">2020/4/2</td>
<td style="text-align: center;">601.9</td>
<td style="text-align: center;">17.0</td>
<td style="text-align: center;">13.5</td>
<td style="text-align: center;">4.0</td>
<td style="text-align: center;">4.7</td>
<td style="text-align: center;">133.8</td>
</tr>
<tr class="odd">
<td style="text-align: center;">2020/4/3</td>
<td style="text-align: center;">607.2</td>
<td style="text-align: center;">17.4</td>
<td style="text-align: center;">13.8</td>
<td style="text-align: center;">4.4</td>
<td style="text-align: center;">4.4</td>
<td style="text-align: center;">132.8</td>
</tr>
<tr class="even">
<td style="text-align: center;">2020/4/4</td>
<td style="text-align: center;">632.1</td>
<td style="text-align: center;">17.9</td>
<td style="text-align: center;">14.1</td>
<td style="text-align: center;">4.8</td>
<td style="text-align: center;">4.8</td>
<td style="text-align: center;">162.5</td>
</tr>
<tr class="odd">
<td style="text-align: center;">2020/4/5</td>
<td style="text-align: center;">685.4</td>
<td style="text-align: center;">19.1</td>
<td style="text-align: center;">15.0</td>
<td style="text-align: center;">6.1</td>
<td style="text-align: center;">6.1</td>
<td style="text-align: center;">192.8</td>
</tr>
<tr class="even">
<td style="text-align: center;">2020/4/6</td>
<td style="text-align: center;">756.6</td>
<td style="text-align: center;">18.7</td>
<td style="text-align: center;">14.9</td>
<td style="text-align: center;">5.3</td>
<td style="text-align: center;">5.3</td>
<td style="text-align: center;">217.5</td>
</tr>
<tr class="odd">
<td style="text-align: center;">2020/4/7</td>
<td style="text-align: center;">753.4</td>
<td style="text-align: center;">18.2</td>
<td style="text-align: center;">14.5</td>
<td style="text-align: center;">4.1</td>
<td style="text-align: center;">4.1</td>
<td style="text-align: center;">164.7</td>
</tr>
<tr class="even">
<td style="text-align: center;">2020/4/8</td>
<td style="text-align: center;">640.3</td>
<td style="text-align: center;">18.8</td>
<td style="text-align: center;">14.6</td>
<td style="text-align: center;">4.7</td>
<td style="text-align: center;">4.7</td>
<td style="text-align: center;">164.8</td>
</tr>
<tr class="odd">
<td style="text-align: center;">2020/4/9</td>
<td style="text-align: center;">1236.2</td>
<td style="text-align: center;">39.6</td>
<td style="text-align: center;">23.9</td>
<td style="text-align: center;">18.8</td>
<td style="text-align: center;">18.8</td>
<td style="text-align: center;">412.2</td>
</tr>
<tr class="even">
<td style="text-align: center;">2020/4/10</td>
<td style="text-align: center;">664.6</td>
<td style="text-align: center;">19.7</td>
<td style="text-align: center;">15.3</td>
<td style="text-align: center;">4.4</td>
<td style="text-align: center;">4.4</td>
<td style="text-align: center;">145.8</td>
</tr>
</tbody>
</table>
<a id="more"></a>
<h1 id="明确要解决的问题">明确要解决的问题</h1>
<h2 id="第一步明确数据的来源和验证其准确性">第一步:明确数据的来源和验证其准确性</h2>
<p> 可以从时间、地点、数据来源三个维度验证数据的准确性:</p>
<ul>
<li>时间:2020.4.1~2020.4.10</li>
<li>地点:全网销售</li>
<li>数据来源:后台导出,数据准确。</li>
</ul>
<h2 id="第二步可视化相关数据">第二步:可视化相关数据</h2>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210111222241.png"></p>
<p> 从折线图可知:9 号交易量骤然增加,其余日期交易量相差不大,比较平稳。</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210111222620.png"></p>
<p> 排除数据异常问题,可以进一步获取上月同期和上年同期的交易量:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210111222931.png"></p>
<h1 id="分析问题出现的原因">分析问题出现的原因</h1>
<p> 分析问题出现的原因,可以对指标进行多维度拆解:</p>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">graph LR</span><br><span class="line">A (交易量) --> B (交易笔数)</span><br><span class="line">A (交易量) --> C (单均交易金额)</span><br><span class="line">B (交易笔数) --> D1 (新客户交易笔数)</span><br><span class="line">B (交易笔数) --> D2 (老客户交易笔数)</span><br></pre></td></tr></table></figure>
<p> 根据指标的拆解,可以做出如下假设:</p>
<ul>
<li>假设 1:单均交易金额增加导致 9 号交易量增加。</li>
<li>假设 2:新客户交易笔数增加导致 9 号交易量增加。</li>
<li>假设 3:老客户交易笔数增加导致 9 号交易量增加。</li>
</ul>
<p> 单均交易金额 = 交易量 / 交易笔数</p>
<figure>
<img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210111225431.png" alt="20210111225431"><figcaption aria-hidden="true">20210111225431</figcaption>
</figure>
<p> 可以看到:9 号单均交易金额骤然下降,与交易量骤增的形势正好相反,假设 1 不成立。</p>
<p> 交易笔数 = 新客户交易笔数 + 老客户交易笔数</p>
<figure>
<img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210111230048.png" alt="20210111230048"><figcaption aria-hidden="true">20210111230048</figcaption>
</figure>
<p> 可以看到:新客户交易笔数和老客户交易笔数在 9 号同时骤然增加,与交易量的走势相符,表明假设 2 和假设 3 正确。</p>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
<categories>
<category>数据分析</category>
</categories>
<tags>
<tag>面试题</tag>
<tag>数据分析</tag>
</tags>
</entry>
<entry>
<title>Excel 复杂业务查询</title>
<url>/2021/01/09/Excel-%E5%A4%8D%E6%9D%82%E4%B8%9A%E5%8A%A1%E6%9F%A5%E8%AF%A2/</url>
<content><![CDATA[<p> 面试题:利用下拉框选择机构名称、利率档、期限等条件,查询出相应的费用。</p>
<a id="more"></a>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210109225303.png"></p>
<h1 id="制作联动下拉列表">制作联动下拉列表</h1>
<p> 制作下拉列表可以借助数据有效性,具体步骤见下图:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210109225732.png"></p>
<p> 具体到本案例中,首先需要找到机构名称和利率档的非重复值,可以利用高级筛选,步骤见下图:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210109230109.png"></p>
<p> 筛选结果见下图:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210109230409.png"></p>
<p> 需要转换成下图的格式:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210109230729.png"></p>
<p> 然后利用公式 - 定义的名称,对区域进行定义,具体步骤见下图:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210109233052.png"></p>
<p> 同理,可以分别将各机构的利率档定义为 A 机构、B 机构和 C 机构。</p>
<p> 为了实现 <a href="https://jingyan.baidu.com/article/7c6fb42802525a80642c9034.html" target="_blank" rel="noopener"></a>,需要将序列的名称用公式定义:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210109233319.png"></p>
<h1 id="查询费用">查询费用</h1>
<p> 查询费用即根据条件在区域内匹配值,可以利用 index+match,index 的作用是在区域内选择指定位置的数,具体参数见下图:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210109233534.png"></p>
<p> 区域是确定的,变化的是行号和列号,需要借助 match 进行匹配:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210109233654.png"></p>
<p> 首先是列号,比较简单:MATCH (I3&"期费用",A1:E1,0)。</p>
<p> 然后是行号,稍微复杂,有两个条件,对应两个区域,条件和条件间、区域和区域间用 & 进行连接:MATCH (I1&I2, A:A&B:B,0)。</p>
<p> 总结可得公式为:INDEX (A:E, MATCH (I1&I2, A:A&B:B,0), MATCH (I3&"期费用",A1:E1,0)),值得注意的是,这是一个数组公式,输入公式后,按住 ctrl_shift+enter 公式才会生效。</p>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
<categories>
<category>Excel</category>
</categories>
<tags>
<tag>Excel</tag>
<tag>面试题</tag>
</tags>
</entry>
<entry>
<title>会员价值分析</title>
<url>/2021/01/03/%E4%BC%9A%E5%91%98%E4%BB%B7%E5%80%BC%E5%88%86%E6%9E%90/</url>
<content><![CDATA[<p> 目前有如下数据:会员卡信息(会员 Id、卡状态、性别、年龄、HYKTYPE(等同于会员卡类型编码)等)、会员卡类型(卡折扣率、卡类型编码和卡类型名称)、门店品牌表(品牌代码、品牌名、品类代码、品类名、店代码、店名等)和销售明细表(会员卡类型编码、销售金额、销售数量、会员卡类型、会员卡 Id 等)。</p>
<a id="more"></a>
<h1 id="漏斗图分析会员结构的合理性">漏斗图分析会员结构的合理性</h1>
<p> 如何进行会员价值分析呢?一个方法是漏斗图,漏斗图可以用于分析流程中每一步的转化和流失:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210103224153.png"></p>
<p> 可以看出,会员转化不符合倒金字塔形状,贵宾卡用户数大于积分卡用户,原因可能是前期快速扩张导致的,如果是此原因后期需控制积分卡升级比率,保障会员卡的价值性。</p>
<p> 同时,与会员结构息息相关的是会员流失情况,明显会员流失情况较为严重,大于一般行业平均水平 10%。</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210103225037.png"></p>
<p> 最后,会员总数也是十分重要的观察指标。</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210103225922.png"></p>
<h1 id="rfm-模型用户价值分类">RFM 模型:用户价值分类</h1>
<p> RFM 模型可以给用户进行分类,衡量各分类用户的价值和获利能力:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210103231359.png"></p>
<p> 进一步,分析每类用户的消费金额:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210103231735.png"></p>
<p> 具体,可以分会员类型进行 RFM 分类:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210103232122.png"></p>
<p> 具体结论:</p>
<ul>
<li><p>低价值的一般挽留客户人数过多,反映出前期会员人数快速扩张但后期维护未及时跟上导致的后遗症;</p></li>
<li><p>最高等级的钻石和黑钻会员中出现了较高比例的价值一般性用户,可能存在一卡多用或将会员卡权益进行倒卖的情况,需重点排查。</p></li>
</ul>
<h1 id="会员消费偏好">会员消费偏好</h1>
<p> 分类会员消费偏好,从品类到品牌:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210103233314.png"></p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210103233346.png"></p>
<p> 也可以分析每类用户的消费区间,看用户类型是否合理:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20210103233545.png"></p>
<p> 具体结论: - 根据不同会员的消费偏好,定向推送活动信息。比如针对积分卡用户重点以化妆品类中的兰蔻、香奈儿等品牌作为大促活动的主推点; - 积分卡和贵宾卡会员的消费区间主要落在 0-300 和 500-1000 的区间,建议每期大促可针对该二类会员定向推送 300-xx 元和 500-xx 元的满减优惠券;钻石卡会员建议大促推送 300 元和 2000 元的满减优惠券;黑钻卡会员建议大促推送 1000 元和 2000 元的满减优惠券。</p>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
</entry>
<entry>
<title>7 周成为数据分析师:SQL 练习题</title>
<url>/2020/12/29/7%E5%91%A8%E7%A7%B0%E4%B8%BA%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90%E5%B8%88%EF%BC%9ASQL-%E7%BB%83%E4%B9%A0%E9%A2%98/</url>
<content><![CDATA[<p> 在秦路老师的《7 周成为数据分析师》的 SQL 教程中,提出了几个问题:</p>
<ul>
<li>统计不同月份的下单人数</li>
<li>统计用户三月份的回购率和复购率</li>
<li>统计男女用户的消费频次是否有差异?</li>
<li>统计多次消费的用户,第一次和最后一次消费间隔是多少?</li>
<li>统计不同年龄段,用户的消费金额是否有差异?</li>
<li>统计消费的二八法则,消费的 top 20% 用户,贡献了多少额度?</li>
</ul>
<a id="more"></a>
<h1 id="统计不同月份的下单人数">统计不同月份的下单人数</h1>
<p> 首先,需要剔除未下单的用户;其次需要按年和月份进行分组统计;然后,需要对不同用户进行统计;最后,需要剔除脏数据(年份为 0000 的数据)。</p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">SELECT</span> </span><br><span class="line"> <span class="keyword">YEAR</span>(paid_time) <span class="keyword">year</span>,</span><br><span class="line"> <span class="keyword">MONTH</span>(paid_time) <span class="keyword">month</span>,</span><br><span class="line"> <span class="keyword">count</span>( <span class="keyword">distinct</span> userId ) <span class="keyword">num</span></span><br><span class="line"><span class="keyword">FROM</span></span><br><span class="line"> data_test.order_info </span><br><span class="line"><span class="keyword">WHERE</span></span><br><span class="line"> is_paid = <span class="string">' 已支付 '</span> <span class="keyword">AND</span></span><br><span class="line"> <span class="keyword">year</span>(paid_time) > <span class="number">0</span></span><br><span class="line"><span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line"> <span class="keyword">YEAR</span>(paid_time) , </span><br><span class="line"> <span class="keyword">MONTH</span> (paid_time)</span><br></pre></td></tr></table></figure>
<h1 id="统计用户三月份的回购率和复购率">统计用户三月份的回购率和复购率</h1>
<p> 回购率是指回购用户占总用户的比例,回购用户是指购买产品一段时间后再次购买该产品的用户。对于不同属性的产品,间隔时间的设定应该是不一样的,这里暂设为一个月。</p>
<p> 用 SQL 计算回购率,最便捷的方式是自连接,即两个同样的表进行连接,但连接键设置为相隔一个月,即 3 月和 2 月连接。</p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">SELECT</span></span><br><span class="line"> <span class="keyword">DATE_FORMAT</span>(t1.m, <span class="string">"% Y-% m"</span>) <span class="keyword">month</span>,</span><br><span class="line"> <span class="keyword">count</span>(t2.m)/<span class="keyword">count</span>(t1.m) repurchase_rate</span><br><span class="line"><span class="keyword">FROM</span></span><br><span class="line"> (<span class="keyword">SELECT</span> </span><br><span class="line"> userId, </span><br><span class="line"> <span class="keyword">DATE_FORMAT</span>(paid_time, <span class="string">"% Y-% m-01"</span>) m</span><br><span class="line"> <span class="keyword">FROM</span> </span><br><span class="line"> data_test.order_info</span><br><span class="line"> <span class="keyword">WHERE</span></span><br><span class="line"> is_paid = <span class="string">' 已支付 '</span></span><br><span class="line"> <span class="keyword">GROUP</span> <span class="keyword">BY</span> </span><br><span class="line"> userId, </span><br><span class="line"> <span class="keyword">DATE_FORMAT</span>(paid_time, <span class="string">"% Y-% m-01"</span>)) t1</span><br><span class="line"><span class="keyword">LEFT</span> <span class="keyword">JOIN</span> (<span class="keyword">SELECT</span></span><br><span class="line"> userId,</span><br><span class="line"> <span class="keyword">DATE_FORMAT</span>(paid_time, <span class="string">"% Y-% m-01"</span>) m</span><br><span class="line"> <span class="keyword">FROM</span></span><br><span class="line"> data_test.order_info</span><br><span class="line"> <span class="keyword">WHERE</span></span><br><span class="line"> is_paid = <span class="string">' 已支付 '</span></span><br><span class="line"> <span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line"> userId,</span><br><span class="line"> <span class="keyword">DATE_FORMAT</span>(paid_time, <span class="string">"% Y-% m-01"</span>)) t2</span><br><span class="line"><span class="keyword">ON</span></span><br><span class="line"> t1.userId = t2.userId <span class="keyword">AND</span></span><br><span class="line"> t1.m = <span class="keyword">DATE_SUB</span>(t2.m,<span class="built_in">INTERVAL</span> <span class="number">1</span> <span class="keyword">MONTH</span>)</span><br><span class="line"><span class="keyword">WHERE</span></span><br><span class="line"> t1.m > <span class="number">0</span> <span class="keyword">or</span></span><br><span class="line"> t2.m > <span class="number">0</span></span><br><span class="line"><span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line"> t1.m</span><br></pre></td></tr></table></figure>
<p> 复购率是指购买次数大于 1 的用户数占总用户数的比例,这里可以用 count 嵌套 if 函数进行处理:</p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">SELECT</span> </span><br><span class="line"> d.order_month <span class="keyword">month</span>,</span><br><span class="line"> <span class="keyword">count</span>(<span class="keyword">if</span>(d.c > <span class="number">1</span>, <span class="number">1</span>, <span class="literal">null</span>))/<span class="keyword">count</span>(d.c) repurchase_rate</span><br><span class="line"><span class="keyword">FROM</span></span><br><span class="line"> (<span class="keyword">SELECT</span> </span><br><span class="line"> userId,</span><br><span class="line"> <span class="keyword">DATE_FORMAT</span>(paid_time, <span class="string">"% Y-% m"</span>) order_month,</span><br><span class="line"> <span class="keyword">COUNT</span>(userId) c</span><br><span class="line"> <span class="keyword">FROM</span></span><br><span class="line"> data_test.order_info</span><br><span class="line"> <span class="keyword">WHERE</span></span><br><span class="line"> is_paid = <span class="string">' 已支付 '</span></span><br><span class="line"> <span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line"> userId,</span><br><span class="line"> order_month</span><br><span class="line"> ) d</span><br><span class="line"> <span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line"> d.order_month</span><br></pre></td></tr></table></figure>
<h1 id="统计男女用户的消费频次是否有差异">统计男女用户的消费频次是否有差异</h1>
<p> 首先需要通过多表连接将订单表和用户表拼接在一起,获取相应信息;其次需要分性别和用户统计订单数;最后,利用平均数计算消费频次。</p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">SELECT</span></span><br><span class="line"> sex,</span><br><span class="line"> <span class="keyword">avg</span>(<span class="keyword">num</span>) fre</span><br><span class="line"><span class="keyword">FROM</span></span><br><span class="line"> (<span class="keyword">SELECT</span></span><br><span class="line"> o.userId, sex, <span class="keyword">count</span>(<span class="number">1</span>) <span class="keyword">num</span></span><br><span class="line"> <span class="keyword">FROM</span></span><br><span class="line"> data_test.order_info o</span><br><span class="line"> <span class="keyword">inner</span> <span class="keyword">JOIN</span> (</span><br><span class="line"> <span class="keyword">SELECT</span></span><br><span class="line"> *</span><br><span class="line"> <span class="keyword">FROM</span></span><br><span class="line"> data_test.user_info</span><br><span class="line"> <span class="keyword">WHERE</span></span><br><span class="line"> sex <> <span class="string">''</span> <span class="keyword">AND</span></span><br><span class="line"> is_paid = <span class="string">' 已支付 '</span>) u</span><br><span class="line"> <span class="keyword">ON</span></span><br><span class="line"> o.userId = u.userId</span><br><span class="line"> <span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line"> userId,</span><br><span class="line"> sex) t</span><br><span class="line"> <span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line"> sex</span><br></pre></td></tr></table></figure>
<h1 id="统计多次消费的用户第一次和最后一次消费间隔是多少">统计多次消费的用户,第一次和最后一次消费间隔是多少?</h1>
<p> 分组统计消费间隔,筛选出消费次数大于 1 的用户。</p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">SELECT</span></span><br><span class="line"> userId,</span><br><span class="line"> <span class="keyword">DATEDIFF</span>(<span class="keyword">max</span>(paid_time), <span class="keyword">min</span>(paid_time)) intervals</span><br><span class="line"><span class="keyword">FROM</span></span><br><span class="line"> data_test.order_info</span><br><span class="line"><span class="keyword">WHERE</span></span><br><span class="line"> is_paid = <span class="string">' 已支付 '</span></span><br><span class="line"><span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line"> userId</span><br><span class="line"><span class="keyword">HAVING</span></span><br><span class="line"> <span class="keyword">count</span>(<span class="number">1</span>) > <span class="number">1</span></span><br></pre></td></tr></table></figure>
<h1 id="统计不同年龄段用户的消费金额是否有差异">统计不同年龄段,用户的消费金额是否有差异?</h1>
<p> 如果只是想看一下各个年龄段的消费金额是否有差异,可以用年龄除以区间间隔,向上取整,然后求消费金额的平均数:</p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">SELECT</span></span><br><span class="line"> age,</span><br><span class="line"> <span class="keyword">AVG</span>(price) consumes</span><br><span class="line"><span class="keyword">FROM</span></span><br><span class="line"> (<span class="keyword">SELECT</span></span><br><span class="line"> o.userId,</span><br><span class="line"> <span class="keyword">CEIL</span>((<span class="keyword">year</span>(<span class="keyword">now</span>()) - <span class="keyword">year</span>(birth_date))/<span class="number">10</span>) age,</span><br><span class="line"> price</span><br><span class="line"> <span class="keyword">FROM</span></span><br><span class="line"> data_test.order_info o</span><br><span class="line"> <span class="keyword">INNER</span> <span class="keyword">JOIN</span></span><br><span class="line"> data_test.user_info u</span><br><span class="line"> <span class="keyword">ON</span></span><br><span class="line"> o.userId = u.userId</span><br><span class="line"> <span class="keyword">WHERE</span></span><br><span class="line"> is_paid = <span class="string">' 已支付 '</span> <span class="keyword">AND</span></span><br><span class="line"> birth_date > <span class="built_in">date</span>(<span class="string">'1901-00-00'</span>)) t</span><br><span class="line"><span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line"> age</span><br></pre></td></tr></table></figure>
<p> 如果想看一下哪个年龄段的消费金额最高或最低,需要利用 case when :</p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">SELECT</span></span><br><span class="line"> (<span class="keyword">CASE</span> </span><br><span class="line"> <span class="keyword">WHEN</span> age < <span class="number">10</span> <span class="keyword">THEN</span> <span class="string">'10 岁以内 '</span></span><br><span class="line"> <span class="keyword">WHEN</span> age <span class="keyword">BETWEEN</span> <span class="number">10</span> <span class="keyword">AND</span> <span class="number">19</span> <span class="keyword">THEN</span> <span class="string">'10~19 岁 '</span></span><br><span class="line"> <span class="keyword">WHEN</span> age <span class="keyword">BETWEEN</span> <span class="number">20</span> <span class="keyword">AND</span> <span class="number">29</span> <span class="keyword">THEN</span> <span class="string">'20~29 岁 '</span></span><br><span class="line"> <span class="keyword">WHEN</span> age <span class="keyword">BETWEEN</span> <span class="number">30</span> <span class="keyword">AND</span> <span class="number">39</span> <span class="keyword">THEN</span> <span class="string">'30~39 岁 '</span></span><br><span class="line"> <span class="keyword">WHEN</span> age <span class="keyword">BETWEEN</span> <span class="number">40</span> <span class="keyword">AND</span> <span class="number">49</span> <span class="keyword">THEN</span> <span class="string">'40~49 岁 '</span></span><br><span class="line"> <span class="keyword">WHEN</span> age <span class="keyword">BETWEEN</span> <span class="number">50</span> <span class="keyword">AND</span> <span class="number">59</span> <span class="keyword">THEN</span> <span class="string">'50~59 岁 '</span></span><br><span class="line"> <span class="keyword">ELSE</span> <span class="string">'60 岁及以上 '</span></span><br><span class="line"> <span class="keyword">END</span>) <span class="keyword">as</span> ages,</span><br><span class="line"> <span class="keyword">AVG</span>(price) price_avg</span><br><span class="line"><span class="keyword">FROM</span></span><br><span class="line"> (<span class="keyword">SELECT</span></span><br><span class="line"> o.userId,</span><br><span class="line"> (<span class="keyword">year</span>(<span class="keyword">now</span>()) - <span class="keyword">year</span>(birth_date)) age,</span><br><span class="line"> price</span><br><span class="line"> <span class="keyword">FROM</span></span><br><span class="line"> data_test.order_info o</span><br><span class="line"> <span class="keyword">INNER</span> <span class="keyword">JOIN</span></span><br><span class="line"> data_test.user_info u</span><br><span class="line"> <span class="keyword">ON</span></span><br><span class="line"> o.userId = u.userId</span><br><span class="line"> <span class="keyword">WHERE</span></span><br><span class="line"> is_paid = <span class="string">' 已支付 '</span> <span class="keyword">AND</span></span><br><span class="line"> birth_date > <span class="built_in">date</span>(<span class="string">'1901-00-00'</span>)) t</span><br><span class="line"><span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line"> ages</span><br></pre></td></tr></table></figure>
<h1 id="统计消费的二八法则消费的-top-20-用户贡献了多少额度">统计消费的二八法则,消费的 top 20% 用户,贡献了多少额度?</h1>
<p> 由于 MySQL 没有分组排名的函数 row_number,因此需借助 limit 限制条数来达到分组排名统计的功能:</p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">SELECT</span></span><br><span class="line"> <span class="keyword">COUNT</span>(userId) user_num,</span><br><span class="line"> <span class="keyword">SUM</span>(total) total_num</span><br><span class="line"><span class="keyword">FROM</span></span><br><span class="line"> (<span class="keyword">SELECT</span></span><br><span class="line"> userId,</span><br><span class="line"> <span class="keyword">sum</span>(price) <span class="keyword">as</span> total</span><br><span class="line"> <span class="keyword">FROM</span></span><br><span class="line"> data_test.order_info</span><br><span class="line"> <span class="keyword">WHERE</span></span><br><span class="line"> is_paid = <span class="string">' 已支付 '</span></span><br><span class="line"> <span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line"> userId</span><br><span class="line"> <span class="keyword">ORDER</span> <span class="keyword">BY</span></span><br><span class="line"> total <span class="keyword">DESC</span></span><br><span class="line"> <span class="keyword">LIMIT</span></span><br><span class="line"> <span class="number">17000</span>) t</span><br></pre></td></tr></table></figure>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
<categories>
<category>数据分析</category>
<category>SQL</category>
</categories>
</entry>
<entry>
<title>win10 安装 superset</title>
<url>/2020/12/18/win10-%E5%AE%89%E8%A3%85-superset/</url>
<content><![CDATA[<p> 在安装 superset 之前,为了避免包版本的混乱,需要先创立一个虚拟的环境:</p>
<a id="more"></a>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line">conda create -n superset python=<span class="number">3.7</span> <span class="comment"># 指定 python 版本为 3.7</span></span><br></pre></td></tr></table></figure>
<p> 然后,进入虚拟环境安装相应的包:</p>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line">conda activate superset</span><br></pre></td></tr></table></figure>
<p> 在正式安装 superset 之前,需要先安装 sasl 和 python-geohash:</p>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line">conda install -c conda-forge python-geohash -y</span><br><span class="line">conda install -c conda-forge sasl -y</span><br></pre></td></tr></table></figure>
<p> 最后,就可以安装 superset:</p>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line">pip install apache-superset -i https://pypi.douban.com/simple</span><br><span class="line">pip install Pillow -i https://pypi.douban.com/simple</span><br></pre></td></tr></table></figure>
<p> 之后,可以对 superset 进行初始化:</p>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="comment"># 创建管理员账户 </span></span><br><span class="line">set FLASK_APP=superset</span><br><span class="line">flask fab create-admin</span><br><span class="line"></span><br><span class="line"><span class="comment"># 初始化数据库 </span></span><br><span class="line">superset db upgrade</span><br><span class="line"></span><br><span class="line"><span class="comment"># 载入案例数据 </span></span><br><span class="line">superset load_examples</span><br><span class="line"></span><br><span class="line"><span class="comment"># 初始化角色和权限 </span></span><br><span class="line">superset init</span><br><span class="line"></span><br><span class="line"><span class="comment"># 启动服务,端口号 8001,使用 -p 更改端口号 </span></span><br><span class="line">superset run -p <span class="number">8001</span> --<span class="keyword">with</span>-threads --reload --debugger</span><br></pre></td></tr></table></figure>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
</entry>
<entry>
<title>AB test</title>
<url>/2020/12/16/AB-test/</url>
<content><![CDATA[<p> AB test 是一种对照实验,目的是为了验证一个模型 / 版本是不是比另外一个模型 / 版本更好。 AB test</p>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
</entry>
<entry>
<title>数据分析:异常值处理</title>
<url>/2020/12/11/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90%EF%BC%9A%E5%BC%82%E5%B8%B8%E5%80%BC%E5%A4%84%E7%90%86/</url>
<content><![CDATA[<p> 异常值处理是数据分析中的一个基本步骤,异常值是指不符合整体样本一般性性质的数值。</p>
<a id="more"></a>
<h1 id="异常值的检测方法">异常值的检测方法</h1>
<p> 常用的检测异常值的方法主要是:3<span class="math inline">\(\sigma\)</span> 原则、箱型图、格拉布斯准则和多维度异常检测。</p>
<h2 id="sigma-原则">3<span class="math inline">\(\sigma\)</span> 原则</h2>
<p> 3<span class="math inline">\(\sigma\)</span> 原则的必要条件是:数据需要服从正态分布。在 3<span class="math inline">\(\sigma\)</span> 原则下,异常值是指超过 3 倍标准差的值;如果数据不服从正态分布,也可以用近似值的 3 倍标准差作为替代。</p>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="keyword">import</span> matplotlib.pyplot <span class="keyword">as</span> plt</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">ThreeSigmod</span><span class="params">(value)</span>:</span></span><br><span class="line"> avg = np.mean (value)</span><br><span class="line"> std = np.std (value)</span><br><span class="line"> threshold_up = avg + <span class="number">3</span>*std</span><br><span class="line"> threshold_down = avg - <span class="number">3</span>*std</span><br><span class="line"> <span class="keyword">return</span> [float (threshold_down), float (threshold_up)]</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">identify_outliers</span><span class="params">(value)</span>:</span></span><br><span class="line"> threshold = ThreeSigmod (value)</span><br><span class="line"> outliers = list (filter (<span class="keyword">lambda</span> x: (s<threshold [<span class="number">0</span>])|(s>threshold [<span class="number">1</span>]), value))</span><br><span class="line"> <span class="keyword">return</span> outliers</span><br></pre></td></tr></table></figure>
<h2 id="箱型图">箱型图</h2>
<p> </p>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="keyword">import</span> matplotlib.pyplot <span class="keyword">as</span> plt</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">fig, axes = plt.subplots ()</span><br><span class="line">data.boxplot (column = <span class="string">''</span>, by=[<span class="string">''</span>, <span class="string">''</span>], ax=axes)</span><br></pre></td></tr></table></figure>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">threshold_value</span><span class="params">(value)</span>:</span></span><br><span class="line"> normal_value = np.quantile (value,<span class="number">0.75</span>) - np.quantile (value,<span class="number">0.25</span>)</span><br><span class="line"> quan_down = np.quantile (value,<span class="number">0.25</span>)<span class="number">-1.5</span>*normal_value</span><br><span class="line"> quan_up = np.quantile (value,<span class="number">0.75</span>)+<span class="number">1.5</span>*normal_value</span><br><span class="line"> <span class="keyword">return</span> [float (quan_down),float (quan_up)]</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">identity_outliers</span><span class="params">(value)</span>:</span></span><br><span class="line"> threshold = threshold_value (value)</span><br><span class="line"> outliers = list (filter (<span class="keyword">lambda</span> x: (s<threshold [<span class="number">0</span>])|(s>threshold [<span class="number">1</span>]), value))</span><br><span class="line"> <span class="keyword">return</span> outliers</span><br></pre></td></tr></table></figure>
<h2 id="格拉布斯法则">格拉布斯法则</h2>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="keyword">from</span> outliers <span class="keyword">import</span> smirnov_grubbs <span class="keyword">as</span> grubbs</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">identity_outliers</span><span class="params">(value)</span>:</span></span><br><span class="line"> outliers = set (value) - set (list (grubbs.test (value, alpha = <span class="number">0.05</span>)))</span><br></pre></td></tr></table></figure>
<h2 id="多维度异常检测">多维度异常检测</h2>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="keyword">from</span> scipy.spatial <span class="keyword">import</span> distance</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">identity_outliers</span><span class="params">(data, num)</span>:</span></span><br><span class="line"> m_dist_order = Series ([float (distance.mahalanobis (data.iloc [i], data.mean (), np.mat (data.cov ().as_matrix ()).I) ** <span class="number">2</span>) <span class="keyword">for</span> i <span class="keyword">in</span> range (len (df))]).sort_values (ascending=<span class="keyword">False</span>).index.tolist ()</span><br><span class="line"> outliers_index = m_dist_order [:num]</span><br><span class="line"> outliers = data.iloc [outliers_index]</span><br><span class="line"> <span class="keyword">return</span> outliers</span><br></pre></td></tr></table></figure>
<h1 id="异常值的处理方法">异常值的处理方法</h1>
<p> 处理异常值一般采用移动平均值和直接删除。</p>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
</entry>
<entry>
<title>Power BI 学习 Day1</title>
<url>/2020/12/08/Power-BI-%E5%AD%A6%E4%B9%A0-Day1/</url>
<content><![CDATA[<p> 在 Power BI 中,可以很方便地为报表插入链接。</p>
<a id="more"></a>
<h1 id="向表中插入超链接"><a href="https://docs.microsoft.com/zh-cn/power-bi/create-reports/power-bi-hyperlinks-in-tables" target="_blank" rel="noopener">向表中插入超链接</a></h1>
<p> 如果数据已经包含 URL,可以通过将数据类别转换为 “Web URL” 插入超链接,具体操作如下图所示:</p>
<figure>
<img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20201208213456.png" alt="20201208213456"><figcaption aria-hidden="true">20201208213456</figcaption>
</figure>
<p> 如果数据尚未包含 URL, 可以通过建立自定义列的方式插入超链接。</p>
<p> 对于长链接,可以在表中显示超链接图表,具体操作如下图所示:</p>
<figure>
<img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20201208214252.png" alt="20201208214252"><figcaption aria-hidden="true">20201208214252</figcaption>
</figure>
<h1 id="向文本框插入超链接"><a href="https://docs.microsoft.com/zh-cn/power-bi/create-reports/service-add-hyperlink-to-text-box" target="_blank" rel="noopener">向文本框插入超链接</a></h1>
<p> 插入文本框并输入一段文字后,可以选中部分文字插入超链接,具体操作如下:</p>
<figure>
<img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20201208214504.png" alt="20201208214504"><figcaption aria-hidden="true">20201208214504</figcaption>
</figure>
<figure>
<img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20201208214513.png" alt="20201208214513"><figcaption aria-hidden="true">20201208214513</figcaption>
</figure>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
<tags>
<tag>Power BI</tag>
<tag>数据分析</tag>
<tag>可视化</tag>
</tags>
</entry>
<entry>
<title>LeetCode MySQL Day9</title>
<url>/2020/09/07/%E2%80%9CLeetCode-MySQL-Day9/</url>
<content><![CDATA[<p> 第 9 天:<br> <a id="more"></a> Write a SQL query to find all numbers that appear at least three times consecutively.</p>
<table>
<thead>
<tr class="header">
<th>Id</th>
<th>Num</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>1</td>
</tr>
<tr class="even">
<td>2</td>
<td>1</td>
</tr>
<tr class="odd">
<td>3</td>
<td>1</td>
</tr>
<tr class="even">
<td>4</td>
<td>2</td>
</tr>
<tr class="odd">
<td>5</td>
<td>1</td>
</tr>
<tr class="even">
<td>6</td>
<td>2</td>
</tr>
<tr class="odd">
<td>7</td>
<td>2</td>
</tr>
<tr class="even">
<td>-</td>
<td>--</td>
</tr>
</tbody>
</table>
<p>For example, given the above Logs table, 1 is the only number that appears consecutively for at least three times.</p>
<table>
<thead>
<tr class="header">
<th>ConsecutiveNums</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
</tr>
</tbody>
</table>
<p> 解决思路:前后相同则表示连续三个数均相同。<br></p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">SELECT</span> <span class="keyword">DISTINCT</span></span><br><span class="line"> ( a.Num ) <span class="keyword">AS</span> ConsecutiveNums </span><br><span class="line"><span class="keyword">FROM</span></span><br><span class="line"> (</span><br><span class="line"> <span class="keyword">SELECT</span></span><br><span class="line"> <span class="keyword">id</span>,</span><br><span class="line"> <span class="keyword">num</span>,</span><br><span class="line"> lag ( <span class="keyword">num</span>, <span class="number">1</span>, <span class="number">0</span> ) <span class="keyword">over</span> () <span class="keyword">AS</span> pre,</span><br><span class="line"> <span class="keyword">lead</span> ( <span class="keyword">num</span>, <span class="number">1</span>, <span class="number">0</span> ) <span class="keyword">over</span> () <span class="keyword">AS</span> nxt </span><br><span class="line"> <span class="keyword">FROM</span></span><br><span class="line"> <span class="keyword">LOGS</span> </span><br><span class="line"> ) a </span><br><span class="line"><span class="keyword">WHERE</span></span><br><span class="line"> a.Num = a.pre </span><br><span class="line"> <span class="keyword">AND</span> a.Num = a.nxt</span><br></pre></td></tr></table></figure>
<p> 知识点:窗口函数 <a href="https://www.jianshu.com/p/e0d73f8b71ec" target="_blank" rel="noopener">lag () 和 lead ()</a>。</p>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
</entry>
<entry>
<title>LeetCode MySQL Day8</title>
<url>/2020/09/06/LeetCode-MySQL-Day8/</url>
<content><![CDATA[<p> 第 8 天:</p>
<a id="more"></a>
<p> Write a SQL query to rank scores. If there is a tie between two scores, both should have the same ranking. Note that after a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no "holes" between ranks.</p>
<table>
<thead>
<tr class="header">
<th>Id</th>
<th>Score</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>3.50</td>
</tr>
<tr class="even">
<td>2</td>
<td>3.65</td>
</tr>
<tr class="odd">
<td>3</td>
<td>4.00</td>
</tr>
<tr class="even">
<td>4</td>
<td>3.85</td>
</tr>
<tr class="odd">
<td>5</td>
<td>4.00</td>
</tr>
<tr class="even">
<td>6</td>
<td>3.65</td>
</tr>
</tbody>
</table>
<p> For example, given the above Scores table, your query should generate the following report (order by highest score):</p>
<table>
<thead>
<tr class="header">
<th>score</th>
<th>Rank</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>4.00</td>
<td>1</td>
</tr>
<tr class="even">
<td>4.00</td>
<td>1</td>
</tr>
<tr class="odd">
<td>3.85</td>
<td>2</td>
</tr>
<tr class="even">
<td>3.65</td>
<td>3</td>
</tr>
<tr class="odd">
<td>3.65</td>
<td>3</td>
</tr>
<tr class="even">
<td>3.50</td>
<td>4</td>
</tr>
</tbody>
</table>
<p> Important Note: For MySQL solutions, to escape reserved words used as column names, you can use an apostrophe before and after the keyword. For example <code>Rank</code>.</p>
<p> 解决方案:</p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">select</span> score, <span class="keyword">dense_rank</span>() <span class="keyword">over</span>(<span class="keyword">order</span> <span class="keyword">by</span> score <span class="keyword">desc</span>) <span class="keyword">as</span> <span class="string">'Rank'</span> <span class="keyword">from</span> Scores</span><br></pre></td></tr></table></figure>
<p> 知识点:</p>
<p><img src="https://picgo-1258437747.cos.ap-nanjing.myqcloud.com/20200906224250.png"></p>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
</entry>
<entry>
<title>从瑟尔沃法则看内外双循环</title>
<url>/2020/09/04/%E4%BB%8E%E7%91%9F%E5%B0%94%E6%B2%83%E6%B3%95%E5%88%99%E7%9C%8B%E5%86%85%E5%A4%96%E5%8F%8C%E5%BE%AA%E7%8E%AF/</url>
<content><![CDATA[<blockquote>
<p> 在收支均衡约束下的哈罗德模型中,经济增长对外部需求扩张的依赖关系,被称为 “瑟尔沃法则”。<br> 令 M 为进口,X 为出口,Y 为实际 GDP,<span class="math inline">\(\mu\)</span> 为进口倾向,则在内部均衡的前提下,外部均衡为 <span class="math inline">\(X=M=\mu Y\)</span>,则有 <span class="math inline">\(Y=\frac {X}{\mu}\)</span>,得 <span class="math inline">\(g = \frac {\Delta Y}{Y} = \frac {\frac {\Delta X}{X}}{\epsilon}\)</span>。<br></p>
</blockquote>
<p> 从分子来看:外部需求的增长与经济增长存在着对应关系。<br> 从分母来看:如果要实现 “内循环”,必须降低国内循环的对外需求,即实现消费品和生产资料的进口替代。</p>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
<categories>
<category>宏观经济</category>
</categories>
</entry>
<entry>
<title>LeetCode MySql Day07</title>
<url>/2020/09/02/LeetCode-MySql-Day07/</url>
<content><![CDATA[<p> 第 7 天 <br> <a id="more"></a> Write a SQL query to get the nth highest salary from the Employee table.</p>
<table>
<thead>
<tr class="header">
<th>Id</th>
<th>Salary</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>100</td>
</tr>
<tr class="even">
<td>2</td>
<td>200</td>
</tr>
<tr class="odd">
<td>3</td>
<td>300</td>
</tr>
</tbody>
</table>
<p>For example, given the above Employee table, the nth highest salary where n = 2 is 200. If there is no nth highest salary, then the query should return null.</p>
<table>
<thead>
<tr class="header">
<th>getNthHighestSalary (2)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>200</td>
</tr>
</tbody>
</table>
<p> 解决方法如下:<br></p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">CREATE</span> <span class="keyword">FUNCTION</span> getNthHighestSalary (N <span class="built_in">INT</span>) <span class="keyword">RETURNS</span> <span class="built_in">INT</span></span><br><span class="line"><span class="keyword">BEGIN</span></span><br><span class="line"><span class="keyword">DECLARE</span> M <span class="built_in">INT</span>;</span><br><span class="line"><span class="keyword">SET</span> M = N<span class="number">-1</span>;</span><br><span class="line"> RETURN (</span><br><span class="line"> # Write your MySQL query statement below.</span><br><span class="line"> <span class="keyword">SELECT</span> <span class="keyword">DISTINCT</span> Salary</span><br><span class="line"> <span class="keyword">FROM</span> Employee</span><br><span class="line"> <span class="keyword">ORDER</span> <span class="keyword">BY</span> Salary <span class="keyword">DESC</span></span><br><span class="line"> <span class="keyword">LIMIT</span> <span class="number">1</span> <span class="keyword">OFFSET</span> M</span><br><span class="line"> );</span><br><span class="line"><span class="keyword">END</span></span><br></pre></td></tr></table></figure>
<p> 这里涉及到自定义函数,基本的语法格式如下:<br> <figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">CREATE</span> <span class="keyword">FUNCTION</span> < 函数名 & gt; ( [ < 参数 < span class="number">1</span>> < 类型 < span class="number">1> [ , < 参数 < span class="number">2> < 类型 < span class="number">2>] ] … )<br><span class="line"> <span class="keyword">RETURNS</span> < 类型 & gt;</span><br><span class="line"> < 函数主体 & gt;</span><br></pre></td></tr></table></figure></p>
<p> 调用自定义函数的基本语法格式如下:<br> <figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">SELECT</span> < 自定义函数名 & gt; ([< 参数 & gt; [,...]])</span><br></pre></td></tr></table></figure></p>
<p> 删除自定义函数的基本语法格式如下:<br> <figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">DROP</span> <span class="keyword">FUNCTION</span> [ <span class="keyword">IF</span> <span class="keyword">EXISTS</span> ] < 自定义函数名 & gt;</span><br></pre></td></tr></table></figure></p>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
<categories>
<category>LeetCode</category>
</categories>
<tags>
<tag>MySQL</tag>
<tag>Function</tag>
</tags>
</entry>
<entry>
<title>灰色预测模型</title>
<url>/2020/09/02/%E7%81%B0%E8%89%B2%E9%A2%84%E6%B5%8B%E6%A8%A1%E5%9E%8B/</url>
<content><![CDATA[<p> 灰色预测模型是在样本较小、信息不完全的情况下,通过挖掘数据的特征进行预测的方法。<br> <a id="more"></a> 首先需要由原始数据序列计算一次累加序列:<br></p>
<figure class="highlight r"><table><tr><td class="code"><pre><span class="line">x0 <- seq (<span class="number">1</span>, <span class="number">9</span>)</span><br><span class="line">x1 <- cumsum (x)</span><br></pre></td></tr></table></figure>
<p> 然后,需要建立矩阵 <span class="math display">\[\pmb {B}=\left [\begin {array}{cc}
-\frac {1}{2}[x^{(1)}(2)+x^{(1)}(1)]&1\\
\vdots&\vdots\\
-\frac {1}{2}[x^{(1)}(k)+x^{(1)}(k-1)]&1
\end {array}\right]
\]</span> 和 <span class="math display">\[\pmb {y}=\left [\begin {array}{c}
x^{0}_{2}\\
\vdots \\
x^{0}_{k}
\end {array}\right]
\]</span></p>
<figure class="highlight r"><table><tr><td class="code"><pre><span class="line">B = matrix (data = <span class="number">1</span>, nrow = (length (x1) - <span class="number">1</span>), ncol = <span class="number">2</span>)</span><br><span class="line"><span class="keyword">for</span> (i <span class="keyword">in</span> <span class="number">1</span>:(length (x1) - <span class="number">1</span>)) {</span><br><span class="line"> B [i, <span class="number">1</span>] = (x1 [i] + x1 [i+<span class="number">1</span>])*(-<span class="number">1.0</span>)/<span class="number">2</span></span><br><span class="line">}</span><br><span class="line">y = as.matrix (x0 [<span class="number">2</span>:length (x0)])</span><br></pre></td></tr></table></figure>
<p> 之后,可以由 <span class="math inline">\(\hat {U} = (\pmb {B}^{T}\pmb {B})^{-1}\pmb {B}^{T}\pmb {y}=\left [\begin {array}{c} \hat {a} \\ \hat {u} \end {array}\right]\)</span>, 求出 <span class="math inline">\(\hat {a}\)</span> 和 <span class="math inline">\(\hat {u}\)</span>:</p>
<figure class="highlight r"><table><tr><td class="code"><pre><span class="line">BT = t (B)</span><br><span class="line">a = sovle (BT %*% B) %*% BT %*% y</span><br></pre></td></tr></table></figure>
<p> 最后,把 <span class="math inline">\(\hat {a}\)</span> 和 <span class="math inline">\(\hat {u}\)</span> 代入时间响应方程:</p>
<p><span class="math display">\[
x^{(1)}(k+1) = [x^{(1)}_{1} - \frac {\hat {u}}{\hat {a}}] e^{-\hat {a} k} + \frac {\hat {u}}{\hat {a}}
\]</span></p>
<figure class="highlight r"><table><tr><td class="code"><pre><span class="line">k <- rep (<span class="number">1</span>, length)</span><br><span class="line">xk <- rep (<span class="number">1</span>,(length (x0)+n))</span><br><span class="line"><span class="keyword">for</span> (i <span class="keyword">in</span> <span class="number">1</span>:length (x0)+n) {</span><br><span class="line"> u = a [<span class="number">2</span>]/a [<span class="number">1</span>]</span><br><span class="line"> xk [i] = (x1 [<span class="number">1</span>] - u)*exp ((-<span class="number">1</span>)*a [<span class="number">1</span>]*(k [i])) + u</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p> 此时得到的结果是拟合值 <span class="math inline">\(\hat {x}^{1}_{i}\)</span> ,需要利用后减运算还原为模型的拟合值 <span class="math inline">\(\hat {x}^{0}_{i}\)</span>:</p>
<figure class="highlight r"><table><tr><td class="code"><pre><span class="line">xhat <- rep (<span class="number">1</span>, length (xk))</span><br><span class="line"><span class="keyword">for</span> (i <span class="keyword">in</span> <span class="number">1</span>:legth (xk)) {</span><br><span class="line"> <span class="keyword">if</span> (i == <span class="number">1</span>) {</span><br><span class="line"> xhat [i] = x0 [<span class="number">1</span>]</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> xhat [i] = xk [i] - xk [i-<span class="number">1</span>]</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p> 得出拟合值后,需要评价一下模型的精度,主要评价指标有:</p>
<ul>
<li>残差:<span class="math inline">\(e (k)=x^{(0)}_{k}-\hat {x}^{(0)}_{k}\)</span>。</li>
<li>相对残差:<span class="math inline">\(e (k)=\frac {x^{(0)}_{k}-\hat {x}^{(0)}_{k}}{x^{(0)}_{k}}\)</span>。</li>
<li>均方差比(后残差比值):<span class="math inline">\(C=\frac {S_{2}}{S_{1}}=\frac {\sqrt {\frac {1}{N}\sum_{k=1}^{N}[x^{(0)}_k-\bar {X}]^{2}}}{\sqrt {\frac {1}{N-1}\sum_{k=2}^{N}[E (k)-\bar {E}]^{2}}}\)</span>,小于 0.35 表示模型精度相对比较好。</li>
<li>小误差概率:<span class="math inline">\(P = P {|E (k)-\bar {E}|<0.6745S_{1}}\)</span>,大于 0.95 表示模型精度相对比较好。</li>
</ul>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kity.min.js"></script><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kityminder.core.min.js"></script><script defer="true" type="text/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.js"></script><link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/mindmap.min.css">]]></content>
</entry>
<entry>
<title>LeetCode MySql Day6</title>
<url>/2020/05/13/LeetCode-MySql-Day06/</url>
<content><![CDATA[<p> 第六天:</p>
<a id="more"></a>
<h2 id="swap-salary">627. Swap Salary</h2>
<p> Given a table <code>salary</code>, such as the one below, that has m=male and f=female values. Swap all f and m values (i.e., change all f values to m and vice versa) with a <strong>single update statement</strong> and no intermediate temp table.</p>
<p> Note that you must write a single update statement, <strong>DO NOT</strong> write any select statement for this problem.</p>
<p> <strong>Example:</strong></p>
<table>
<thead>
<tr class="header">
<th>id</th>
<th>name</th>
<th>sex</th>
<th>salary</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>A</td>
<td>m</td>
<td>2500</td>
</tr>
<tr class="even">
<td>2</td>
<td>B</td>
<td>f</td>
<td>1500</td>
</tr>
<tr class="odd">
<td>3</td>
<td>C</td>
<td>m</td>
<td>5500</td>
</tr>
<tr class="even">
<td>4</td>
<td>D</td>
<td>f</td>
<td>500</td>
</tr>
</tbody>
</table>
<p> After running your <strong>update</strong> statement, the above salary table should have the following rows:</p>
<table>
<thead>
<tr class="header">
<th>id</th>
<th>name</th>
<th>sex</th>
<th>salary</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>A</td>
<td>f</td>
<td>2500</td>
</tr>
<tr class="even">
<td>2</td>
<td>B</td>
<td>m</td>
<td>1500</td>
</tr>
<tr class="odd">
<td>3</td>
<td>C</td>
<td>f</td>
<td>5500</td>
</tr>
<tr class="even">
<td>4</td>
<td>D</td>
<td>m</td>
<td>500</td>
</tr>
</tbody>
</table>
<p> 解决方法:</p>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">update</span> salary <span class="keyword">set</span> sex = (<span class="keyword">case</span> <span class="keyword">when</span> sex = <span class="string">"f"</span> <span class="keyword">then</span> <span class="string">"m"</span> <span class="keyword">else</span> <span class="string">"f"</span> <span class="keyword">end</span>);</span><br></pre></td></tr></table></figure>
<figure class="highlight sql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">update</span> salary <span class="keyword">set</span> sex = <span class="built_in">char</span>(<span class="keyword">ASCII</span>(<span class="string">'f'</span>) ^ <span class="keyword">ASCII</span>(<span class="string">'m'</span>) ^ <span class="keyword">ASCII</span>(sex));</span><br></pre></td></tr></table></figure>
<p> 解法一:利用条件表达式函数 case when … … then … … else … … end,相当于 if … … else 。</p>
<p> 解法二:利用 XOR (异或) 运算符和 ASCII 编码进行转换:f 和 m 的 ASCII 编码值分别为 102 和 109 ,XOR 运算符 (^) 的作用是舍去二进制的进位,具体而言:首先将十进制转换为二进制,即 bin (102)=0b1100110,bin (109)=0b1101101,易知 0b1100110^0b110110 = 0b1011 ,再将二进制转换为十进制,即 int (str (1011), 2) = 11,同理可知 11^102=109 和 11^109 = 102 。</p>
<h2 id="reformat-department-table">1179. Reformat Department Table</h2>
<p> Table: <code>Department</code></p>
<table>
<thead>
<tr class="header">
<th>Column Name</th>
<th>Type</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>id</td>
<td>int</td>
</tr>
<tr class="even">
<td>revenue</td>
<td>int</td>
</tr>
<tr class="odd">
<td>month</td>
<td>varchar</td>
</tr>
</tbody>
</table>
<p> (id, month) is the primary key of this table.</p>
<p> The table has information about the revenue of each department per month.</p>
<p> The month has values in ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"].</p>
<p> Write an SQL query to reformat the table such that there is a department id column and a revenue column <strong>for each month</strong>.</p>
<p> The query result format is in the following example:</p>
<p> Department table:</p>
<table>
<thead>
<tr class="header">
<th>id</th>
<th>revenue</th>
<th>month</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>8000</td>
<td>Jan</td>
</tr>
<tr class="even">
<td>2</td>
<td>9000</td>
<td>Jan</td>
</tr>
<tr class="odd">
<td>3</td>
<td>10000</td>
<td>Feb</td>
</tr>
<tr class="even">
<td>1</td>
<td>7000</td>
<td>Feb</td>
</tr>
<tr class="odd">
<td>1</td>
<td>6000</td>
<td>Mar</td>
</tr>
</tbody>
</table>
<p> Result table:</p>
<table>
<thead>
<tr class="header">
<th>id</th>
<th>Jan_Revenue</th>
<th>Feb_Revenue</th>
<th>Mar_Revenue</th>
<th>...</th>
<th>Dec_Revenue</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>8000</td>