Linear and Logistic Regression ==== * input format: LibSVM * Local Example: [run-linear.sh](run-linear.sh) * Runnig on YARN: [run-yarn.sh](run-yarn.sh) - You will need to have YARN - Modify ```../make/config.mk``` to set USE_HDFS=1 to compile with HDFS support - Run build.sh on [../../yarn](../../yarn) on to build yarn jar file Multi-Threading Optimization ==== * The code can be multi-threaded, we encourage you to use it - Simply add ```nthread=k``` where k is the number of threads you want to use * If you submit with YARN - Use ```--vcores``` and ```-mem``` to request CPU and memory resources - Some scheduler in YARN do not honor CPU request, you can request more memory to grab working slots * Usually multi-threading improves speed in general - You can use less workers and assign more resources to each of worker - This usually means less communication overhead and faster running time Parameters ==== All the parameters can be set by param=value #### Important Parameters * objective [default = logistic] - can be linear or logistic * base_score [default = 0.5] - global bias, recommended set to mean value of label * reg_L1 [default = 0] - l1 regularization co-efficient * reg_L2 [default = 1] - l2 regularization co-efficient * lbfgs_stop_tol [default = 1e-5] - relative tolerance level of loss reduction with respect to initial loss * max_lbfgs_iter [default = 500] - maximum number of lbfgs iterations ### Optimization Related parameters * min_lbfgs_iter [default = 5] - minimum number of lbfgs iterations * max_linesearch_iter [default = 100] - maximum number of iterations in linesearch * linesearch_c1 [default = 1e-4] - c1 co-efficient in backoff linesearch * linesarch_backoff [default = 0.5] - backoff ratio in linesearch