59e63bcminor6233050ok14477f9add namenode75a6d34add libhdfs optse3c76bfminmum fix8b3c435chg2035799test code7751b2badd debug7690313okbd346b4okfaba1dcadd testload6f7783eadd testloade5f0340ok3ed9ec8chge552ac4ask for more ram in amb2505e3only stop nm when sucessbc696c9add queue infof3e867eadd option queue5dc843crefactor fileiocd9c81bquick fix1e23af2add virtual destructor to iseekstreamf165ffbfix hdfs8cc6508allow demo to pass in envfad4d69ok0fd6197fix more7423837fix mored25de54add temporal solution, run_yarn_prog.pye5a9e31final attempted3bee8add command back0774000add hdfs to resource9b66e7efix hadoop6812f14ok08e1c16change hadoop prefix back to hadoop homed6b6828Update build.sh146e069bugfix: logical boundary for ring buffer19cb685ok4cf3c13Merge branch 'master' of ssh://github.com/tqchen/rabit20daddbadd trackerc57dad8add ringbased passing and batch schedule295d8a1update994cb02add sge014c866OK git-subtree-dir: subtree/rabit git-subtree-split:59e63bc135
Linear and Logistic Regression
- input format: LibSVM
- Local Example: run-linear.sh
- Runnig on YARN: run-yarn.sh
- You will need to have YARN
- Modify
../make/config.mkto set USE_HDFS=1 to compile with HDFS support - Run build.sh on ../../yarn on to build yarn jar file
Multi-Threading Optimization
- The code can be multi-threaded, we encourage you to use it
- Simply add
nthread=kwhere k is the number of threads you want to use
- Simply add
- If you submit with YARN
- Use
--vcoresand-memto request CPU and memory resources - Some scheduler in YARN do not honor CPU request, you can request more memory to grab working slots
- Use
- Usually multi-threading improves speed in general
- You can use less workers and assign more resources to each of worker
- This usually means less communication overhead and faster running time
Parameters
All the parameters can be set by param=value
Important Parameters
- objective [default = logistic]
- can be linear or logistic
- base_score [default = 0.5]
- global bias, recommended set to mean value of label
- reg_L1 [default = 0]
- l1 regularization co-efficient
- reg_L2 [default = 1]
- l2 regularization co-efficient
- lbfgs_stop_tol [default = 1e-5]
- relative tolerance level of loss reduction with respect to initial loss
- max_lbfgs_iter [default = 500]
- maximum number of lbfgs iterations
Optimization Related parameters
- min_lbfgs_iter [default = 5]
- minimum number of lbfgs iterations
- max_linesearch_iter [default = 100]
- maximum number of iterations in linesearch
- linesearch_c1 [default = 1e-4]
- c1 co-efficient in backoff linesearch
- linesarch_backoff [default = 0.5]
- backoff ratio in linesearch