Modernize XGBoost Python document. (#7468)

* Use sphinx gallery to integrate examples.
* Remove mock objects.
* Add dask doc inventory.
This commit is contained in:
Jiaming Yuan 2021-11-23 23:24:52 +08:00 committed by GitHub
parent 96a9848c9e
commit c024c42dce
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
30 changed files with 130 additions and 84 deletions

3
.gitignore vendored
View File

@ -131,6 +131,3 @@ credentials.csv
.vscode .vscode
.metals .metals
.bloop .bloop
# Demo
demo

View File

@ -1,19 +0,0 @@
XGBoost Python Feature Walkthrough
==================================
* [Basic walkthrough of wrappers](basic_walkthrough.py)
* [Re-implement RMSLE as customized metric and objective](custom_rmsle.py)
* [Re-Implement `multi:softmax` objective as customized objective](custom_softmax.py)
* [Boosting from existing prediction](boost_from_prediction.py)
* [Predicting using first n trees](predict_first_ntree.py)
* [Generalized Linear Model](generalized_linear_model.py)
* [Cross validation](cross_validation.py)
* [Predicting leaf indices](predict_leaf_indices.py)
* [Sklearn Wrapper](sklearn_examples.py)
* [Sklearn Parallel](sklearn_parallel.py)
* [Sklearn access evals result](sklearn_evals_result.py)
* [Access evals result](evals_result.py)
* [External Memory](external_memory.py)
* [Training continuation](continuation.py)
* [Feature weights for column sampling](feature_weights.py)
* [Basic Categorical data support](categorical.py)
* [Compare builtin categorical data support with one-hot encoding](cat_in_the_dat.py)

View File

@ -0,0 +1,5 @@
XGBoost Python Feature Walkthrough
==================================
This is a collection of examples for using the XGBoost Python package.

View File

@ -1,3 +1,7 @@
"""
Getting started with XGBoost
============================
"""
import numpy as np import numpy as np
import scipy.sparse import scipy.sparse
import pickle import pickle

View File

@ -1,3 +1,7 @@
"""
Demo for boosting from prediction
=================================
"""
import os import os
import xgboost as xgb import xgboost as xgb

View File

@ -1,5 +1,6 @@
''' '''
Demo for using and defining callback functions. Demo for using and defining callback functions
==============================================
.. versionadded:: 1.3.0 .. versionadded:: 1.3.0
''' '''

View File

@ -1,4 +1,8 @@
"""A simple demo for categorical data support using dataset from Kaggle categorical data """
Train XGBoost with cat_in_the_dat dataset
=========================================
A simple demo for categorical data support using dataset from Kaggle categorical data
tutorial. tutorial.
The excellent tutorial is at: The excellent tutorial is at:
@ -8,7 +12,7 @@ And the data can be found at:
https://www.kaggle.com/shahules/an-overview-of-encoding-techniques/data https://www.kaggle.com/shahules/an-overview-of-encoding-techniques/data
Also, see the tutorial for using XGBoost with categorical data: Also, see the tutorial for using XGBoost with categorical data:
https://xgboost.readthedocs.io/en/latest/tutorials/categorical.html :doc:`/tutorials/categorical`.
.. versionadded 1.6.0 .. versionadded 1.6.0

View File

@ -1,12 +1,16 @@
"""Experimental support for categorical data. After 1.5 XGBoost `gpu_hist` tree method """
has experimental support for one-hot encoding based tree split. Getting started with categorical data
=====================================
Experimental support for categorical data. After 1.5 XGBoost `gpu_hist` tree method has
experimental support for one-hot encoding based tree split.
In before, users need to run an encoder themselves before passing the data into XGBoost, In before, users need to run an encoder themselves before passing the data into XGBoost,
which creates a sparse matrix and potentially increase memory usage. This demo showcases which creates a sparse matrix and potentially increase memory usage. This demo showcases
the experimental categorical data support, more advanced features are planned. the experimental categorical data support, more advanced features are planned.
Also, see the tutorial for using XGBoost with categorical data: Also, see :doc:`the tutorial </tutorials/categorical>` for using XGBoost with categorical data
https://xgboost.readthedocs.io/en/latest/tutorials/categorical.html
.. versionadded:: 1.5.0 .. versionadded:: 1.5.0

View File

@ -1,5 +1,6 @@
""" """
Demo for training continuation. Demo for training continuation
==============================
""" """
from sklearn.datasets import load_breast_cancer from sklearn.datasets import load_breast_cancer

View File

@ -1,3 +1,7 @@
"""
Demo for using cross validation
===============================
"""
import os import os
import numpy as np import numpy as np
import xgboost as xgb import xgboost as xgb

View File

@ -1,16 +1,19 @@
'''Demo for defining customized metric and objective. Notice that for """
simplicity reason weight is not used in following example. In this Demo for defining a custom regression objective and metric
script, we implement the Squared Log Error (SLE) objective and RMSLE metric as customized ==========================================================
functions, then compare it with native implementation in XGBoost.
See doc/tutorials/custom_metric_obj.rst for a step by step Demo for defining customized metric and objective. Notice that for simplicity reason
walkthrough, with other details. weight is not used in following example. In this script, we implement the Squared Log
Error (SLE) objective and RMSLE metric as customized functions, then compare it with
native implementation in XGBoost.
The `SLE` objective reduces impact of outliers in training dataset, See doc/tutorials/custom_metric_obj.rst for a step by step walkthrough, with other
hence here we also compare its performance with standard squared details.
error.
''' The `SLE` objective reduces impact of outliers in training dataset, hence here we also
compare its performance with standard squared error.
"""
import numpy as np import numpy as np
import xgboost as xgb import xgboost as xgb
from typing import Tuple, Dict, List from typing import Tuple, Dict, List
@ -171,9 +174,6 @@ def plot_history(rmse_evals, rmsle_evals, py_rmsle_evals):
ax2.plot(x, py_rmsle_evals['dtest']['PyRMSLE'], label='test-PyRMSLE') ax2.plot(x, py_rmsle_evals['dtest']['PyRMSLE'], label='test-PyRMSLE')
ax2.legend() ax2.legend()
plt.show()
plt.close()
def main(args): def main(args):
dtrain, dtest = generate_data() dtrain, dtest = generate_data()
@ -183,9 +183,10 @@ def main(args):
if args.plot != 0: if args.plot != 0:
plot_history(rmse_evals, rmsle_evals, py_rmsle_evals) plot_history(rmse_evals, rmsle_evals, py_rmsle_evals)
plt.show()
if __name__ == '__main__': if __name__ == "__main__":
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description='Arguments for custom RMSLE objective function demo.') description='Arguments for custom RMSLE objective function demo.')
parser.add_argument( parser.add_argument(

View File

@ -1,10 +1,12 @@
'''Demo for creating customized multi-class objective function. This demo is '''
only applicable after (excluding) XGBoost 1.0.0, as before this version XGBoost Demo for creating customized multi-class objective function
returns transformed prediction for multi-class objective function. More ===========================================================
details in comments.
See https://xgboost.readthedocs.io/en/latest/tutorials/custom_metric_obj.html for detailed This demo is only applicable after (excluding) XGBoost 1.0.0, as before this version
tutorial and notes. XGBoost returns transformed prediction for multi-class objective function. More details
in comments.
See :doc:`/tutorials/custom_metric_obj` for detailed tutorial and notes.
''' '''

View File

@ -1,6 +1,7 @@
## """
# This script demonstrate how to access the eval metrics in xgboost This script demonstrate how to access the eval metrics
## ======================================================
"""
import os import os
import xgboost as xgb import xgboost as xgb

View File

@ -1,6 +1,9 @@
"""Experimental support for external memory. This is similar to the one in """
`quantile_data_iterator.py`, but for external memory instead of Quantile DMatrix. The Experimental support for external memory
feature is not ready for production use yet. ========================================
This is similar to the one in `quantile_data_iterator.py`, but for external memory
instead of Quantile DMatrix. The feature is not ready for production use yet.
.. versionadded:: 1.5.0 .. versionadded:: 1.5.0

View File

@ -1,4 +1,6 @@
'''Using feature weight to change column sampling. '''
Demo for using feature weight to change column sampling
=======================================================
.. versionadded:: 1.3.0 .. versionadded:: 1.3.0
''' '''

View File

@ -1,3 +1,7 @@
"""
Demo for gamma regression
=========================
"""
import xgboost as xgb import xgboost as xgb
import numpy as np import numpy as np

View File

@ -1,3 +1,7 @@
"""
Demo for GLM
============
"""
import os import os
import xgboost as xgb import xgboost as xgb
## ##

View File

@ -1,3 +1,7 @@
"""
Demo for prediction using number of trees
=========================================
"""
import os import os
import numpy as np import numpy as np
import xgboost as xgb import xgboost as xgb

View File

@ -1,3 +1,7 @@
"""
Demo for obtaining leaf index
=============================
"""
import os import os
import xgboost as xgb import xgboost as xgb

View File

@ -1,4 +1,6 @@
'''A demo for defining data iterator. '''
Demo for using data iterator with Quantile DMatrix
==================================================
.. versionadded:: 1.2.0 .. versionadded:: 1.2.0

View File

@ -1,6 +1,7 @@
## """
# This script demonstrate how to access the xgboost eval metrics by using sklearn Demo for accessing the xgboost eval metrics by using sklearn interface
## ======================================================================
"""
import xgboost as xgb import xgboost as xgb
import numpy as np import numpy as np

View File

@ -1,4 +1,7 @@
''' '''
Collection of examples for using sklearn interface
==================================================
Created on 1 Apr 2015 Created on 1 Apr 2015
@author: Jamie Hall @author: Jamie Hall

View File

@ -1,3 +1,7 @@
"""
Demo for using xgboost with sklearn
===================================
"""
from sklearn.model_selection import GridSearchCV from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_boston from sklearn.datasets import load_boston
import xgboost as xgb import xgboost as xgb

View File

@ -1,5 +1,9 @@
"""Demo for using `process_type` with `prune` and `refresh`. Modifying existing trees is """
not a well established use for XGBoost, so feel free to experiment. Demo for using `process_type` with `prune` and `refresh`
========================================================
Modifying existing trees is not a well established use for XGBoost, so feel free to
experiment.
""" """

View File

@ -62,12 +62,6 @@ libpath = os.path.join(curr_path, '../python-package/')
sys.path.insert(0, libpath) sys.path.insert(0, libpath)
sys.path.insert(0, curr_path) sys.path.insert(0, curr_path)
# -- mock out modules
import mock # NOQA
MOCK_MODULES = ['scipy', 'scipy.sparse', 'sklearn', 'pandas']
for mod_name in MOCK_MODULES:
sys.modules[mod_name] = mock.Mock()
# -- General configuration ------------------------------------------------ # -- General configuration ------------------------------------------------
# General information about the project. # General information about the project.
@ -90,10 +84,17 @@ extensions = [
'sphinx.ext.napoleon', 'sphinx.ext.napoleon',
'sphinx.ext.mathjax', 'sphinx.ext.mathjax',
'sphinx.ext.intersphinx', 'sphinx.ext.intersphinx',
"sphinx_gallery.gen_gallery",
'breathe', 'breathe',
'recommonmark' 'recommonmark'
] ]
sphinx_gallery_conf = {
"examples_dirs": "../demo/guide-python", # path to your example scripts
"gallery_dirs": "python/examples", # path to where to save gallery generated output
"matplotlib_animations": True,
}
autodoc_typehints = "description" autodoc_typehints = "description"
graphviz_output_format = 'png' graphviz_output_format = 'png'
@ -201,11 +202,13 @@ latex_documents = [
] ]
intersphinx_mapping = { intersphinx_mapping = {
'python': ('https://docs.python.org/3.6', None), "python": ("https://docs.python.org/3.6", None),
'numpy': ('http://docs.scipy.org/doc/numpy/', None), "numpy": ("http://docs.scipy.org/doc/numpy/", None),
'scipy': ('http://docs.scipy.org/doc/scipy/reference/', None), "scipy": ("http://docs.scipy.org/doc/scipy/reference/", None),
'pandas': ('http://pandas-docs.github.io/pandas-docs-travis/', None), "pandas": ("http://pandas-docs.github.io/pandas-docs-travis/", None),
'sklearn': ('http://scikit-learn.org/stable', None) "sklearn": ("http://scikit-learn.org/stable", None),
"dask": ("https://docs.dask.org/en/stable/", None),
"distributed": ("https://distributed.dask.org/en/stable/", None),
} }

1
doc/python/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
examples

View File

@ -13,4 +13,4 @@ Contents
python_api python_api
callbacks callbacks
model model
Python examples <https://github.com/dmlc/xgboost/tree/master/demo/guide-python> examples/index

View File

@ -5,7 +5,7 @@ This document gives a basic walkthrough of the xgboost package for Python.
**List of other Helpful Links** **List of other Helpful Links**
* `Python walkthrough code collections <https://github.com/dmlc/xgboost/blob/master/demo/guide-python>`_ * :doc:`/python/examples/index`
* :doc:`Python API Reference <python_api>` * :doc:`Python API Reference <python_api>`
Install XGBoost Install XGBoost

View File

@ -8,3 +8,4 @@ graphviz
numpy numpy
recommonmark recommonmark
xgboost_ray xgboost_ray
sphinx-gallery

View File

@ -57,13 +57,10 @@ can plot the model and calculate the global feature importance:
The ``scikit-learn`` interface from dask is similar to single node version. The basic The ``scikit-learn`` interface from dask is similar to single node version. The basic
idea is create dataframe with category feature type, and tell XGBoost to use ``gpu_hist`` idea is create dataframe with category feature type, and tell XGBoost to use ``gpu_hist``
with parameter ``enable_categorical``. See `this demo with parameter ``enable_categorical``. See :ref:`sphx_glr_python_examples_categorical.py`
<https://github.com/dmlc/xgboost/blob/master/demo/guide-python/categorical.py>`__ for a for a worked example of using categorical data with ``scikit-learn`` interface. A
worked example of using categorical data with ``scikit-learn`` interface. A comparison comparison between using one-hot encoded data and XGBoost's categorical data support can
between using one-hot encoded data and XGBoost's categorical data support can be found be found :ref:`sphx_glr_python_examples_cat_in_the_dat.py`.
`here
<https://github.com/dmlc/xgboost/blob/master/demo/guide-python/cat_in_the_dat.py>`__.
********************** **********************