Modernize XGBoost Python document. (#7468)
* Use sphinx gallery to integrate examples. * Remove mock objects. * Add dask doc inventory.
This commit is contained in:
parent
96a9848c9e
commit
c024c42dce
5
.gitignore
vendored
5
.gitignore
vendored
@ -130,7 +130,4 @@ credentials.csv
|
||||
# Visual Studio code + extensions
|
||||
.vscode
|
||||
.metals
|
||||
.bloop
|
||||
|
||||
# Demo
|
||||
demo
|
||||
.bloop
|
||||
@ -1,19 +0,0 @@
|
||||
XGBoost Python Feature Walkthrough
|
||||
==================================
|
||||
* [Basic walkthrough of wrappers](basic_walkthrough.py)
|
||||
* [Re-implement RMSLE as customized metric and objective](custom_rmsle.py)
|
||||
* [Re-Implement `multi:softmax` objective as customized objective](custom_softmax.py)
|
||||
* [Boosting from existing prediction](boost_from_prediction.py)
|
||||
* [Predicting using first n trees](predict_first_ntree.py)
|
||||
* [Generalized Linear Model](generalized_linear_model.py)
|
||||
* [Cross validation](cross_validation.py)
|
||||
* [Predicting leaf indices](predict_leaf_indices.py)
|
||||
* [Sklearn Wrapper](sklearn_examples.py)
|
||||
* [Sklearn Parallel](sklearn_parallel.py)
|
||||
* [Sklearn access evals result](sklearn_evals_result.py)
|
||||
* [Access evals result](evals_result.py)
|
||||
* [External Memory](external_memory.py)
|
||||
* [Training continuation](continuation.py)
|
||||
* [Feature weights for column sampling](feature_weights.py)
|
||||
* [Basic Categorical data support](categorical.py)
|
||||
* [Compare builtin categorical data support with one-hot encoding](cat_in_the_dat.py)
|
||||
5
demo/guide-python/README.rst
Normal file
5
demo/guide-python/README.rst
Normal file
@ -0,0 +1,5 @@
|
||||
XGBoost Python Feature Walkthrough
|
||||
==================================
|
||||
|
||||
|
||||
This is a collection of examples for using the XGBoost Python package.
|
||||
@ -1,3 +1,7 @@
|
||||
"""
|
||||
Getting started with XGBoost
|
||||
============================
|
||||
"""
|
||||
import numpy as np
|
||||
import scipy.sparse
|
||||
import pickle
|
||||
|
||||
@ -1,3 +1,7 @@
|
||||
"""
|
||||
Demo for boosting from prediction
|
||||
=================================
|
||||
"""
|
||||
import os
|
||||
import xgboost as xgb
|
||||
|
||||
|
||||
@ -1,5 +1,6 @@
|
||||
'''
|
||||
Demo for using and defining callback functions.
|
||||
Demo for using and defining callback functions
|
||||
==============================================
|
||||
|
||||
.. versionadded:: 1.3.0
|
||||
'''
|
||||
|
||||
@ -1,4 +1,8 @@
|
||||
"""A simple demo for categorical data support using dataset from Kaggle categorical data
|
||||
"""
|
||||
Train XGBoost with cat_in_the_dat dataset
|
||||
=========================================
|
||||
|
||||
A simple demo for categorical data support using dataset from Kaggle categorical data
|
||||
tutorial.
|
||||
|
||||
The excellent tutorial is at:
|
||||
@ -8,7 +12,7 @@ And the data can be found at:
|
||||
https://www.kaggle.com/shahules/an-overview-of-encoding-techniques/data
|
||||
|
||||
Also, see the tutorial for using XGBoost with categorical data:
|
||||
https://xgboost.readthedocs.io/en/latest/tutorials/categorical.html
|
||||
:doc:`/tutorials/categorical`.
|
||||
|
||||
.. versionadded 1.6.0
|
||||
|
||||
|
||||
@ -1,12 +1,16 @@
|
||||
"""Experimental support for categorical data. After 1.5 XGBoost `gpu_hist` tree method
|
||||
has experimental support for one-hot encoding based tree split.
|
||||
"""
|
||||
Getting started with categorical data
|
||||
=====================================
|
||||
|
||||
Experimental support for categorical data. After 1.5 XGBoost `gpu_hist` tree method has
|
||||
experimental support for one-hot encoding based tree split.
|
||||
|
||||
In before, users need to run an encoder themselves before passing the data into XGBoost,
|
||||
which creates a sparse matrix and potentially increase memory usage. This demo showcases
|
||||
the experimental categorical data support, more advanced features are planned.
|
||||
|
||||
Also, see the tutorial for using XGBoost with categorical data:
|
||||
https://xgboost.readthedocs.io/en/latest/tutorials/categorical.html
|
||||
Also, see :doc:`the tutorial </tutorials/categorical>` for using XGBoost with categorical data
|
||||
|
||||
|
||||
.. versionadded:: 1.5.0
|
||||
|
||||
|
||||
@ -1,5 +1,6 @@
|
||||
"""
|
||||
Demo for training continuation.
|
||||
Demo for training continuation
|
||||
==============================
|
||||
"""
|
||||
|
||||
from sklearn.datasets import load_breast_cancer
|
||||
|
||||
@ -1,3 +1,7 @@
|
||||
"""
|
||||
Demo for using cross validation
|
||||
===============================
|
||||
"""
|
||||
import os
|
||||
import numpy as np
|
||||
import xgboost as xgb
|
||||
|
||||
@ -1,16 +1,19 @@
|
||||
'''Demo for defining customized metric and objective. Notice that for
|
||||
simplicity reason weight is not used in following example. In this
|
||||
script, we implement the Squared Log Error (SLE) objective and RMSLE metric as customized
|
||||
functions, then compare it with native implementation in XGBoost.
|
||||
"""
|
||||
Demo for defining a custom regression objective and metric
|
||||
==========================================================
|
||||
|
||||
See doc/tutorials/custom_metric_obj.rst for a step by step
|
||||
walkthrough, with other details.
|
||||
Demo for defining customized metric and objective. Notice that for simplicity reason
|
||||
weight is not used in following example. In this script, we implement the Squared Log
|
||||
Error (SLE) objective and RMSLE metric as customized functions, then compare it with
|
||||
native implementation in XGBoost.
|
||||
|
||||
The `SLE` objective reduces impact of outliers in training dataset,
|
||||
hence here we also compare its performance with standard squared
|
||||
error.
|
||||
See doc/tutorials/custom_metric_obj.rst for a step by step walkthrough, with other
|
||||
details.
|
||||
|
||||
'''
|
||||
The `SLE` objective reduces impact of outliers in training dataset, hence here we also
|
||||
compare its performance with standard squared error.
|
||||
|
||||
"""
|
||||
import numpy as np
|
||||
import xgboost as xgb
|
||||
from typing import Tuple, Dict, List
|
||||
@ -171,9 +174,6 @@ def plot_history(rmse_evals, rmsle_evals, py_rmsle_evals):
|
||||
ax2.plot(x, py_rmsle_evals['dtest']['PyRMSLE'], label='test-PyRMSLE')
|
||||
ax2.legend()
|
||||
|
||||
plt.show()
|
||||
plt.close()
|
||||
|
||||
|
||||
def main(args):
|
||||
dtrain, dtest = generate_data()
|
||||
@ -183,9 +183,10 @@ def main(args):
|
||||
|
||||
if args.plot != 0:
|
||||
plot_history(rmse_evals, rmsle_evals, py_rmsle_evals)
|
||||
plt.show()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Arguments for custom RMSLE objective function demo.')
|
||||
parser.add_argument(
|
||||
|
||||
@ -1,10 +1,12 @@
|
||||
'''Demo for creating customized multi-class objective function. This demo is
|
||||
only applicable after (excluding) XGBoost 1.0.0, as before this version XGBoost
|
||||
returns transformed prediction for multi-class objective function. More
|
||||
details in comments.
|
||||
'''
|
||||
Demo for creating customized multi-class objective function
|
||||
===========================================================
|
||||
|
||||
See https://xgboost.readthedocs.io/en/latest/tutorials/custom_metric_obj.html for detailed
|
||||
tutorial and notes.
|
||||
This demo is only applicable after (excluding) XGBoost 1.0.0, as before this version
|
||||
XGBoost returns transformed prediction for multi-class objective function. More details
|
||||
in comments.
|
||||
|
||||
See :doc:`/tutorials/custom_metric_obj` for detailed tutorial and notes.
|
||||
|
||||
'''
|
||||
|
||||
|
||||
@ -1,6 +1,7 @@
|
||||
##
|
||||
# This script demonstrate how to access the eval metrics in xgboost
|
||||
##
|
||||
"""
|
||||
This script demonstrate how to access the eval metrics
|
||||
======================================================
|
||||
"""
|
||||
import os
|
||||
import xgboost as xgb
|
||||
|
||||
|
||||
@ -1,6 +1,9 @@
|
||||
"""Experimental support for external memory. This is similar to the one in
|
||||
`quantile_data_iterator.py`, but for external memory instead of Quantile DMatrix. The
|
||||
feature is not ready for production use yet.
|
||||
"""
|
||||
Experimental support for external memory
|
||||
========================================
|
||||
|
||||
This is similar to the one in `quantile_data_iterator.py`, but for external memory
|
||||
instead of Quantile DMatrix. The feature is not ready for production use yet.
|
||||
|
||||
.. versionadded:: 1.5.0
|
||||
|
||||
|
||||
@ -1,4 +1,6 @@
|
||||
'''Using feature weight to change column sampling.
|
||||
'''
|
||||
Demo for using feature weight to change column sampling
|
||||
=======================================================
|
||||
|
||||
.. versionadded:: 1.3.0
|
||||
'''
|
||||
|
||||
@ -1,3 +1,7 @@
|
||||
"""
|
||||
Demo for gamma regression
|
||||
=========================
|
||||
"""
|
||||
import xgboost as xgb
|
||||
import numpy as np
|
||||
|
||||
|
||||
@ -1,3 +1,7 @@
|
||||
"""
|
||||
Demo for GLM
|
||||
============
|
||||
"""
|
||||
import os
|
||||
import xgboost as xgb
|
||||
##
|
||||
|
||||
@ -1,3 +1,7 @@
|
||||
"""
|
||||
Demo for prediction using number of trees
|
||||
=========================================
|
||||
"""
|
||||
import os
|
||||
import numpy as np
|
||||
import xgboost as xgb
|
||||
|
||||
@ -1,3 +1,7 @@
|
||||
"""
|
||||
Demo for obtaining leaf index
|
||||
=============================
|
||||
"""
|
||||
import os
|
||||
import xgboost as xgb
|
||||
|
||||
|
||||
@ -1,4 +1,6 @@
|
||||
'''A demo for defining data iterator.
|
||||
'''
|
||||
Demo for using data iterator with Quantile DMatrix
|
||||
==================================================
|
||||
|
||||
.. versionadded:: 1.2.0
|
||||
|
||||
|
||||
@ -1,6 +1,7 @@
|
||||
##
|
||||
# This script demonstrate how to access the xgboost eval metrics by using sklearn
|
||||
##
|
||||
"""
|
||||
Demo for accessing the xgboost eval metrics by using sklearn interface
|
||||
======================================================================
|
||||
"""
|
||||
|
||||
import xgboost as xgb
|
||||
import numpy as np
|
||||
|
||||
@ -1,4 +1,7 @@
|
||||
'''
|
||||
Collection of examples for using sklearn interface
|
||||
==================================================
|
||||
|
||||
Created on 1 Apr 2015
|
||||
|
||||
@author: Jamie Hall
|
||||
|
||||
@ -1,3 +1,7 @@
|
||||
"""
|
||||
Demo for using xgboost with sklearn
|
||||
===================================
|
||||
"""
|
||||
from sklearn.model_selection import GridSearchCV
|
||||
from sklearn.datasets import load_boston
|
||||
import xgboost as xgb
|
||||
|
||||
@ -1,5 +1,9 @@
|
||||
"""Demo for using `process_type` with `prune` and `refresh`. Modifying existing trees is
|
||||
not a well established use for XGBoost, so feel free to experiment.
|
||||
"""
|
||||
Demo for using `process_type` with `prune` and `refresh`
|
||||
========================================================
|
||||
|
||||
Modifying existing trees is not a well established use for XGBoost, so feel free to
|
||||
experiment.
|
||||
|
||||
"""
|
||||
|
||||
|
||||
25
doc/conf.py
25
doc/conf.py
@ -62,12 +62,6 @@ libpath = os.path.join(curr_path, '../python-package/')
|
||||
sys.path.insert(0, libpath)
|
||||
sys.path.insert(0, curr_path)
|
||||
|
||||
# -- mock out modules
|
||||
import mock # NOQA
|
||||
MOCK_MODULES = ['scipy', 'scipy.sparse', 'sklearn', 'pandas']
|
||||
for mod_name in MOCK_MODULES:
|
||||
sys.modules[mod_name] = mock.Mock()
|
||||
|
||||
# -- General configuration ------------------------------------------------
|
||||
|
||||
# General information about the project.
|
||||
@ -90,10 +84,17 @@ extensions = [
|
||||
'sphinx.ext.napoleon',
|
||||
'sphinx.ext.mathjax',
|
||||
'sphinx.ext.intersphinx',
|
||||
"sphinx_gallery.gen_gallery",
|
||||
'breathe',
|
||||
'recommonmark'
|
||||
]
|
||||
|
||||
sphinx_gallery_conf = {
|
||||
"examples_dirs": "../demo/guide-python", # path to your example scripts
|
||||
"gallery_dirs": "python/examples", # path to where to save gallery generated output
|
||||
"matplotlib_animations": True,
|
||||
}
|
||||
|
||||
autodoc_typehints = "description"
|
||||
|
||||
graphviz_output_format = 'png'
|
||||
@ -201,11 +202,13 @@ latex_documents = [
|
||||
]
|
||||
|
||||
intersphinx_mapping = {
|
||||
'python': ('https://docs.python.org/3.6', None),
|
||||
'numpy': ('http://docs.scipy.org/doc/numpy/', None),
|
||||
'scipy': ('http://docs.scipy.org/doc/scipy/reference/', None),
|
||||
'pandas': ('http://pandas-docs.github.io/pandas-docs-travis/', None),
|
||||
'sklearn': ('http://scikit-learn.org/stable', None)
|
||||
"python": ("https://docs.python.org/3.6", None),
|
||||
"numpy": ("http://docs.scipy.org/doc/numpy/", None),
|
||||
"scipy": ("http://docs.scipy.org/doc/scipy/reference/", None),
|
||||
"pandas": ("http://pandas-docs.github.io/pandas-docs-travis/", None),
|
||||
"sklearn": ("http://scikit-learn.org/stable", None),
|
||||
"dask": ("https://docs.dask.org/en/stable/", None),
|
||||
"distributed": ("https://distributed.dask.org/en/stable/", None),
|
||||
}
|
||||
|
||||
|
||||
|
||||
1
doc/python/.gitignore
vendored
Normal file
1
doc/python/.gitignore
vendored
Normal file
@ -0,0 +1 @@
|
||||
examples
|
||||
@ -13,4 +13,4 @@ Contents
|
||||
python_api
|
||||
callbacks
|
||||
model
|
||||
Python examples <https://github.com/dmlc/xgboost/tree/master/demo/guide-python>
|
||||
examples/index
|
||||
|
||||
@ -5,7 +5,7 @@ This document gives a basic walkthrough of the xgboost package for Python.
|
||||
|
||||
**List of other Helpful Links**
|
||||
|
||||
* `Python walkthrough code collections <https://github.com/dmlc/xgboost/blob/master/demo/guide-python>`_
|
||||
* :doc:`/python/examples/index`
|
||||
* :doc:`Python API Reference <python_api>`
|
||||
|
||||
Install XGBoost
|
||||
|
||||
@ -8,3 +8,4 @@ graphviz
|
||||
numpy
|
||||
recommonmark
|
||||
xgboost_ray
|
||||
sphinx-gallery
|
||||
@ -57,13 +57,10 @@ can plot the model and calculate the global feature importance:
|
||||
|
||||
The ``scikit-learn`` interface from dask is similar to single node version. The basic
|
||||
idea is create dataframe with category feature type, and tell XGBoost to use ``gpu_hist``
|
||||
with parameter ``enable_categorical``. See `this demo
|
||||
<https://github.com/dmlc/xgboost/blob/master/demo/guide-python/categorical.py>`__ for a
|
||||
worked example of using categorical data with ``scikit-learn`` interface. A comparison
|
||||
between using one-hot encoded data and XGBoost's categorical data support can be found
|
||||
`here
|
||||
<https://github.com/dmlc/xgboost/blob/master/demo/guide-python/cat_in_the_dat.py>`__.
|
||||
|
||||
with parameter ``enable_categorical``. See :ref:`sphx_glr_python_examples_categorical.py`
|
||||
for a worked example of using categorical data with ``scikit-learn`` interface. A
|
||||
comparison between using one-hot encoded data and XGBoost's categorical data support can
|
||||
be found :ref:`sphx_glr_python_examples_cat_in_the_dat.py`.
|
||||
|
||||
|
||||
**********************
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user