forked from datumbox/datumbox-framework
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO.txt
executable file
·75 lines (54 loc) · 2.5 KB
/
TODO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
CODE IMPROVEMENTS
=================
- Consider dropping all the common.dataobjects and use their internalData directly instead.
- Refactor the statistics package and replace all the static methods with proper inheritance.
- Write generic optimizers instead of having optimization methods in the algorithms. Add the optimizers and regularization packages under mathematics.
NEW FEATURES
============
- Create a storage engine for MapDB 3 once caching, asynchronous writing and compression is supported.
- Create a storage engine for BerkeleyDB.
- Add the ability to call Machine Learning algorithms from command line or Python:
- https://pypi.python.org/pypi/javabridge
- https://github.com/LeeKamentsky/python-javabridge/
- https://github.com/fracpete/python-weka-wrapper
DOCUMENTATION
=============
- Improve the code documentation.
- Write How-to blog posts on building Text Classification models.
- Update the website and link directly to the latest and previous documentations.
NEW ALGORITHMS
==============
- Create a PercentileScaler numerical scaler.
- Create the following FeatureSelectors: AnovaSelect, KruskalWallisSelect, SpearmanSelect.
- Speed up LDA: http://www.cs.ucsb.edu/~mingjia/cs240/doc/273811.pdf
- Factorization Machines: http://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf
- Develop the FunkSVD and PLSI as probabilistic version of SVD.
- Collaborative Filtering for Implicit Feedback Datasets: http://yifanhu.net/PUB/cf.pdf
- Write a Mixture of Gaussians clustering method.
- Include an anomaly detection algorithm.
- Provide a wrapper for DBSCANClusterer and NeuralNet implementations of Maths.
- Add the ability to search through the configuration space and find the best performing algorithmic configuration.
TO CHECK OUT
============
Linear Algebra
--------------
- JBLAS - Linear Algebra for Java:
https://github.com/mikiobraun/jblas
http://jblas.org/
Huge Collection libs, DBs and Storage
-------------------------------------
- Vanilla-java - HugeCollections:
https://code.google.com/p/vanilla-java/wiki/HugeCollections
- Fastutil:
http://fastutil.di.unimi.it/#install
- Joafip:
http://joafip.sourceforge.net/javadoc/net/sf/joafip/java/util/PHashMap.html
- Chronicle Map:
https://github.com/OpenHFT/Chronicle-Map/
- H2 Database:
http://www.h2database.com/html/main.html
- ehcache:
http://www.ehcache.org/
http://stackoverflow.com/questions/4726370/looking-for-a-drop-in-replacement-for-a-java-util-map
- redisson:
https://github.com/redisson/redisson