Linear and Logistic Regression

  • input format: LibSVM
  • Local Example: run-linear.sh
  • Runnig on YARN: run-yarn.sh
    • You will need to have YARN
    • Modify ../make/config.mk to set USE_HDFS=1 to compile with HDFS support
    • Run build.sh on ../../yarn on to build yarn jar file

Multi-Threading Optimization

  • The code can be multi-threaded, we encourage you to use it
    • Simply add nthread=k where k is the number of threads you want to use
  • If you submit with YARN
    • Use --vcores and -mem to request CPU and memory resources
    • Some scheduler in YARN do not honor CPU request, you can request more memory to grab working slots
  • Usually multi-threading improves speed in general
    • You can use less workers and assign more resources to each of worker
    • This usually means less communication overhead and faster running time

Parameters

All the parameters can be set by param=value

Important Parameters

  • objective [default = logistic]
    • can be linear or logistic
  • base_score [default = 0.5]
    • global bias, recommended set to mean value of label
  • reg_L1 [default = 0]
    • l1 regularization co-efficient
  • reg_L2 [default = 1]
    • l2 regularization co-efficient
  • lbfgs_stop_tol [default = 1e-5]
    • relative tolerance level of loss reduction with respect to initial loss
  • max_lbfgs_iter [default = 500]
    • maximum number of lbfgs iterations
  • min_lbfgs_iter [default = 5]
    • minimum number of lbfgs iterations
  • max_linesearch_iter [default = 100]
    • maximum number of iterations in linesearch
  • linesearch_c1 [default = 1e-4]
    • c1 co-efficient in backoff linesearch
  • linesarch_backoff [default = 0.5]
    • backoff ratio in linesearch