fsb: fast & simple boosting

Fast & Simple implementation of GBM

Goal:

Fast (Handle 40M rows * 500 features within 12 hours)
Simple (The less lines of code, the better) <= 3000
Mudular/Extensible for further improvements

How to Use

Install folly and thrift.
Modify Makefile and boosting.sh and make FOLLY and THRIFT point to the right places.
Run make
Run boosting.sh

Algorithms:

pre-bucketing (data compression)
bucket sort to build histogram, then linear scan to find best split
hints and intelligent of using #buckets
stochastic gradient boosting machine

Features:

correctness (model + fimps)
deterministic randomness
easily extensible for wide varieties of similar algorithms: random forest, bagging, gbm, for both classification and regression methods, regression takes priority

New features:

byte/short: two layer of storage. (save both memory and cpu)
taking hints based on previous fimps (top 1/3 using short, rest using byte)

Parameters:

m: number trees n: number of leaves per tree r: example sampling rate s: feature sampling rate

d: number of data points f: number of features

k: number of buckets ml: minimum number of datapoints per leave

Complexity: Memory: max(f * d1 * 8, [f * d, f * d * 2))

Algorithmic:

Bucketization: O(f * d1 * log(d1))
Continue reading: O(f * d2 * log(k))
Single Best Split: O(f' * d' + f' * k)
Trees

depth-k balanced tree: k * S
single n-leaves tree: #splits: (2n - 3), O(S * n * log(n)) (roughly)

D: 20M, exampling sampling: 4M feature sampling rate:

Components:

Config: (specify data format and training parameters) DataSet: (column-wise storage, with Self Compression) Tree: (works both in compressed/raw) TreeRegressor: (k-leaf regression tree) GbmFun: (function to extend to different types of loss) Gbm: (gradient boosting machine)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
include		include
src		src
Makefile		Makefile
README.md		README.md
boosting.sh		boosting.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fsb: fast & simple boosting

Goal:

How to Use

Algorithms:

Features:

New features:

Parameters:

About

Releases

Packages

Languages

tachim/boosting

Folders and files

Latest commit

History

Repository files navigation

fsb: fast & simple boosting

Goal:

How to Use

Algorithms:

Features:

New features:

Parameters:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages