48 lines
1.8 KiB
Markdown
48 lines
1.8 KiB
Markdown
Linear and Logistic Regression
|
|
====
|
|
* input format: LibSVM
|
|
* Local Example: [run-linear.sh](run-linear.sh)
|
|
* Runnig on YARN: [run-yarn.sh](run-yarn.sh)
|
|
- You will need to have YARN
|
|
- Modify ```../make/config.mk``` to set USE_HDFS=1 to compile with HDFS support
|
|
- Run build.sh on [../../yarn](../../yarn) on to build yarn jar file
|
|
|
|
Multi-Threading Optimization
|
|
====
|
|
* The code can be multi-threaded, we encourage you to use it
|
|
- Simply add ```nthread=k``` where k is the number of threads you want to use
|
|
* If you submit with YARN
|
|
- Use ```--vcores``` and ```-mem``` to request CPU and memory resources
|
|
- Some scheduler in YARN do not honor CPU request, you can request more memory to grab working slots
|
|
* Usually multi-threading improves speed in general
|
|
- You can use less workers and assign more resources to each of worker
|
|
- This usually means less communication overhead and faster running time
|
|
|
|
Parameters
|
|
====
|
|
All the parameters can be set by param=value
|
|
|
|
#### Important Parameters
|
|
* objective [default = logistic]
|
|
- can be linear or logistic
|
|
* base_score [default = 0.5]
|
|
- global bias, recommended set to mean value of label
|
|
* reg_L1 [default = 0]
|
|
- l1 regularization co-efficient
|
|
* reg_L2 [default = 1]
|
|
- l2 regularization co-efficient
|
|
* lbfgs_stop_tol [default = 1e-5]
|
|
- relative tolerance level of loss reduction with respect to initial loss
|
|
* max_lbfgs_iter [default = 500]
|
|
- maximum number of lbfgs iterations
|
|
|
|
### Optimization Related parameters
|
|
* min_lbfgs_iter [default = 5]
|
|
- minimum number of lbfgs iterations
|
|
* max_linesearch_iter [default = 100]
|
|
- maximum number of iterations in linesearch
|
|
* linesearch_c1 [default = 1e-4]
|
|
- c1 co-efficient in backoff linesearch
|
|
* linesarch_backoff [default = 0.5]
|
|
- backoff ratio in linesearch
|
|
|