tqchen 57b5d7873f Squashed 'subtree/rabit/' changes from d4ec037..28ca7be

28ca7be add linear readme
ca4b20f add linear readme
1133628 add linear readme
6a11676 update docs
a607047 Update build.sh
2c1cfd8 complete yarn
4f28e32 change formater
2fbda81 fix stdin input
3258bcf checkin yarn master
67ebf81 allow setup from env variables
9b6bf57 fix hdfs
395d5c2 add make system
88ce767 refactor io, initial hdfs file access need test
19be870 chgs
a1bd3c6 Merge branch 'master' of ssh://github.com/tqchen/rabit
1a573f9 introduce input split
29476f1 fix timer issue

git-subtree-dir: subtree/rabit
git-subtree-split: 28ca7becbdf6503e6b1398588a969efb164c9701

2015-03-09 13:28:38 -07:00

1.8 KiB

Raw Blame History

Linear and Logistic Regression

input format: LibSVM
Local Example: run-linear.sh
Runnig on YARN: run-yarn.sh
- You will need to have YARN
- Modify ../make/config.mk to set USE_HDFS=1 to compile with HDFS support
- Run build.sh on ../../yarn on to build yarn jar file

Multi-Threading Optimization

The code can be multi-threaded, we encourage you to use it
- Simply add nthread=k where k is the number of threads you want to use
If you submit with YARN
- Use --vcores and -mem to request CPU and memory resources
- Some scheduler in YARN do not honor CPU request, you can request more memory to grab working slots
Usually multi-threading improves speed in general
- You can use less workers and assign more resources to each of worker
- This usually means less communication overhead and faster running time

Parameters

All the parameters can be set by param=value

Important Parameters

objective [default = logistic]
- can be linear or logistic
base_score [default = 0.5]
- global bias, recommended set to mean value of label
reg_L1 [default = 0]
- l1 regularization co-efficient
reg_L2 [default = 1]
- l2 regularization co-efficient
lbfgs_stop_tol [default = 1e-5]
- relative tolerance level of loss reduction with respect to initial loss
max_lbfgs_iter [default = 500]
- maximum number of lbfgs iterations

min_lbfgs_iter [default = 5]
- minimum number of lbfgs iterations
max_linesearch_iter [default = 100]
- maximum number of iterations in linesearch
linesearch_c1 [default = 1e-4]
- c1 co-efficient in backoff linesearch
linesarch_backoff [default = 0.5]
- backoff ratio in linesearch

1.8 KiB Raw Blame History

Linear and Logistic Regression

Multi-Threading Optimization

Parameters

Important Parameters

Optimization Related parameters

1.8 KiB

Raw Blame History