28ca7be add linear readme ca4b20f add linear readme 1133628 add linear readme 6a11676 update docs a607047 Update build.sh 2c1cfd8 complete yarn 4f28e32 change formater 2fbda81 fix stdin input 3258bcf checkin yarn master 67ebf81 allow setup from env variables 9b6bf57 fix hdfs 395d5c2 add make system 88ce767 refactor io, initial hdfs file access need test 19be870 chgs a1bd3c6 Merge branch 'master' of ssh://github.com/tqchen/rabit 1a573f9 introduce input split 29476f1 fix timer issue git-subtree-dir: subtree/rabit git-subtree-split: 28ca7becbdf6503e6b1398588a969efb164c9701
1.8 KiB
1.8 KiB
Linear and Logistic Regression
- input format: LibSVM
- Local Example: run-linear.sh
- Runnig on YARN: run-yarn.sh
- You will need to have YARN
- Modify
../make/config.mkto set USE_HDFS=1 to compile with HDFS support - Run build.sh on ../../yarn on to build yarn jar file
Multi-Threading Optimization
- The code can be multi-threaded, we encourage you to use it
- Simply add
nthread=kwhere k is the number of threads you want to use
- Simply add
- If you submit with YARN
- Use
--vcoresand-memto request CPU and memory resources - Some scheduler in YARN do not honor CPU request, you can request more memory to grab working slots
- Use
- Usually multi-threading improves speed in general
- You can use less workers and assign more resources to each of worker
- This usually means less communication overhead and faster running time
Parameters
All the parameters can be set by param=value
Important Parameters
- objective [default = logistic]
- can be linear or logistic
- base_score [default = 0.5]
- global bias, recommended set to mean value of label
- reg_L1 [default = 0]
- l1 regularization co-efficient
- reg_L2 [default = 1]
- l2 regularization co-efficient
- lbfgs_stop_tol [default = 1e-5]
- relative tolerance level of loss reduction with respect to initial loss
- max_lbfgs_iter [default = 500]
- maximum number of lbfgs iterations
Optimization Related parameters
- min_lbfgs_iter [default = 5]
- minimum number of lbfgs iterations
- max_linesearch_iter [default = 100]
- maximum number of iterations in linesearch
- linesearch_c1 [default = 1e-4]
- c1 co-efficient in backoff linesearch
- linesarch_backoff [default = 0.5]
- backoff ratio in linesearch