Philip Hyunsu Cho 9adb812a0a
RMM integration plugin (#5873)
* [CI] Add RMM as an optional dependency

* Replace caching allocator with pool allocator from RMM

* Revert "Replace caching allocator with pool allocator from RMM"

This reverts commit e15845d4e72e890c2babe31a988b26503a7d9038.

* Use rmm::mr::get_default_resource()

* Try setting default resource (doesn't work yet)

* Allocate pool_mr in the heap

* Prevent leaking pool_mr handle

* Separate EXPECT_DEATH() in separate test suite suffixed DeathTest

* Turn off death tests for RMM

* Address reviewer's feedback

* Prevent leaking of cuda_mr

* Fix Jenkinsfile syntax

* Remove unnecessary function in Jenkinsfile

* [CI] Install NCCL into RMM container

* Run Python tests

* Try building with RMM, CUDA 10.0

* Do not use RMM for CUDA 10.0 target

* Actually test for test_rmm flag

* Fix TestPythonGPU

* Use CNMeM allocator, since pool allocator doesn't yet support multiGPU

* Use 10.0 container to build RMM-enabled XGBoost

* Revert "Use 10.0 container to build RMM-enabled XGBoost"

This reverts commit 789021fa31112e25b683aef39fff375403060141.

* Fix Jenkinsfile

* [CI] Assign larger /dev/shm to NCCL

* Use 10.2 artifact to run multi-GPU Python tests

* Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target

* Rename Conda env rmm_test -> gpu_test

* Use env var to opt into CNMeM pool for C++ tests

* Use identical CUDA version for RMM builds and tests

* Use Pytest fixtures to enable RMM pool in Python tests

* Move RMM to plugin/CMakeLists.txt; use PLUGIN_RMM

* Use per-device MR; use command arg in gtest

* Set CMake prefix path to use Conda env

* Use 0.15 nightly version of RMM

* Remove unnecessary header

* Fix a unit test when cudf is missing

* Add RMM demos

* Remove print()

* Use HostDeviceVector in GPU predictor

* Simplify pytest setup; use LocalCUDACluster fixture

* Address reviewers' commments

Co-authored-by: Hyunsu Cho <chohyu01@cs.wasshington.edu>
2020-08-12 01:26:02 -07:00
..
2020-03-09 19:41:39 +08:00
2020-07-13 23:25:17 -07:00
2019-12-24 13:39:07 +08:00
2020-08-12 01:26:02 -07:00

Awesome XGBoost

This page contains a curated list of examples, tutorials, blogs about XGBoost usecases. It is inspired by awesome-MXNet, awesome-php and awesome-machine-learning.

Please send a pull request if you find things that belongs to here.

Contents

Code Examples

Features Walkthrough

This is a list of short codes introducing different functionalities of xgboost packages.

Basic Examples by Tasks

Most of examples in this section are based on CLI or python version. However, the parameter settings can be applied to all versions

Benchmarks

Machine Learning Challenge Winning Solutions

XGBoost is extensively used by machine learning practitioners to create state of art data science solutions, this is a list of machine learning winning solutions with XGBoost. Please send pull requests if you find ones that are missing here.

Talks

Tutorials

Usecases

If you have particular usecase of xgboost that you would like to highlight. Send a PR to add a one sentence description:)

  • XGBoost is used in Kaggle Script to solve data science challenges.
  • Distribute XGBoost as Rest API server from Jupyter notebook with BentoML. Link to notebook
  • Seldon predictive service powered by XGBoost
  • XGBoost Distributed is used in ODPS Cloud Service by Alibaba (in Chinese)
  • XGBoost is incoporated as part of Graphlab Create for scalable machine learning.
  • Hanjing Su from Tencent data platform team: "We use distributed XGBoost for click through prediction in wechat shopping and lookalikes. The problems involve hundreds millions of users and thousands of features. XGBoost is cleanly designed and can be easily integrated into our production environment, reducing our cost in developments."
  • CNevd from autohome.com ad platform team: "Distributed XGBoost is used for click through rate prediction in our display advertising, XGBoost is highly efficient and flexible and can be easily used on our distributed platform, our ctr made a great improvement with hundred millions samples and millions features due to this awesome XGBoost"

Tools using XGBoost

  • BayesBoost - Bayesian Optimization using xgboost and sklearn API
  • gp_xgboost_gridsearch - In-database parallel grid-search for XGBoost on Greenplum using PL/Python
  • tpot - A Python tool that automatically creates and optimizes machine learning pipelines using genetic programming.

Integrations with 3rd party software

Open source integrations with XGBoost:

  • Neptune.ai - Experiment management and collaboration tool for ML/DL/RL specialists. Integration has a form of the XGBoost callback that automatically logs training and evaluation metrics, as well as saved model (booster), feature importance chart and visualized trees.
  • Optuna - An open source hyperparameter optimization framework to automate hyperparameter search. Optuna integrates with XGBoost in the XGBoostPruningCallback that let users easily prune unpromising trials.

Awards

Windows Binaries

Unofficial windows binaries and instructions on how to use them are hosted on Guido Tapia's blog