Vadim Khotilovich c70022e6c4 spelling, wording, and doc fixes in c++ code
I was reading through the code and fixing some things in the comments.
Only a few trivial actual code changes were made to make things more
readable.
2015-12-12 21:40:12 -06:00
..
2015-07-03 19:20:45 -07:00
2014-08-27 19:31:49 -07:00
2015-07-03 19:35:23 -07:00

Coding Guide

This file is intended to be notes about code structure in xgboost

Project Logical Layout

  • Dependency order: io->learner->gbm->tree
    • All module depends on data.h
  • tree are implementations of tree construction algorithms.
  • gbm is gradient boosting interface, that takes trees and other base learner to do boosting.
    • gbm only takes gradient as sufficient statistics, it does not compute the gradient.
  • learner is learning module that computes gradient for specific object, and pass it to GBM

File Naming Convention

  • .h files are data structures and interface, which are needed to use functions in that layer.
  • -inl.hpp files are implementations of interface, like cpp file in most project.
    • You only need to understand the interface file to understand the usage of that layer
  • In each folder, there can be a .cpp file, that compiles the module of that layer

How to Hack the Code

  • Add objective function: add to learner/objective-inl.hpp and register it in learner/objective.h CreateObjFunction
    • You can also directly do it in python
  • Add new evaluation metric: add to learner/evaluation-inl.hpp and register it in learner/evaluation.h CreateEvaluator
  • Add wrapper for a new language, most likely you can do it by taking the functions in python/xgboost_wrapper.h, which is purely C based, and call these C functions to use xgboost