28ca7beadd linear readmeca4b20fadd linear readme1133628add linear readme6a11676update docsa607047Update build.sh2c1cfd8complete yarn4f28e32change formater2fbda81fix stdin input3258bcfcheckin yarn master67ebf81allow setup from env variables9b6bf57fix hdfs395d5c2add make system88ce767refactor io, initial hdfs file access need test19be870chgsa1bd3c6Merge branch 'master' of ssh://github.com/tqchen/rabit1a573f9introduce input split29476f1fix timer issue git-subtree-dir: subtree/rabit git-subtree-split:28ca7becbd
Linear and Logistic Regression
- input format: LibSVM
- Local Example: run-linear.sh
- Runnig on YARN: run-yarn.sh
- You will need to have YARN
- Modify
../make/config.mkto set USE_HDFS=1 to compile with HDFS support - Run build.sh on ../../yarn on to build yarn jar file
Multi-Threading Optimization
- The code can be multi-threaded, we encourage you to use it
- Simply add
nthread=kwhere k is the number of threads you want to use
- Simply add
- If you submit with YARN
- Use
--vcoresand-memto request CPU and memory resources - Some scheduler in YARN do not honor CPU request, you can request more memory to grab working slots
- Use
- Usually multi-threading improves speed in general
- You can use less workers and assign more resources to each of worker
- This usually means less communication overhead and faster running time
Parameters
All the parameters can be set by param=value
Important Parameters
- objective [default = logistic]
- can be linear or logistic
- base_score [default = 0.5]
- global bias, recommended set to mean value of label
- reg_L1 [default = 0]
- l1 regularization co-efficient
- reg_L2 [default = 1]
- l2 regularization co-efficient
- lbfgs_stop_tol [default = 1e-5]
- relative tolerance level of loss reduction with respect to initial loss
- max_lbfgs_iter [default = 500]
- maximum number of lbfgs iterations
Optimization Related parameters
- min_lbfgs_iter [default = 5]
- minimum number of lbfgs iterations
- max_linesearch_iter [default = 100]
- maximum number of iterations in linesearch
- linesearch_c1 [default = 1e-4]
- c1 co-efficient in backoff linesearch
- linesarch_backoff [default = 0.5]
- backoff ratio in linesearch