docs: fix bug in tutorial (#10143)

This commit is contained in:
Fabi 2024-04-01 04:14:40 +02:00 committed by GitHub
parent bc9ea62ec0
commit e15d61b916
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -52,7 +52,7 @@ Notice that the samples are sorted based on their query index in a non-decreasin
X, y = make_classification(random_state=seed) X, y = make_classification(random_state=seed)
rng = np.random.default_rng(seed) rng = np.random.default_rng(seed)
n_query_groups = 3 n_query_groups = 3
qid = rng.integers(0, 3, size=X.shape[0]) qid = rng.integers(0, n_query_groups, size=X.shape[0])
# Sort the inputs based on query index # Sort the inputs based on query index
sorted_idx = np.argsort(qid) sorted_idx = np.argsort(qid)
@ -65,14 +65,14 @@ The simplest way to train a ranking model is by using the scikit-learn estimator
.. code-block:: python .. code-block:: python
ranker = xgb.XGBRanker(tree_method="hist", lambdarank_num_pair_per_sample=8, objective="rank:ndcg", lambdarank_pair_method="topk") ranker = xgb.XGBRanker(tree_method="hist", lambdarank_num_pair_per_sample=8, objective="rank:ndcg", lambdarank_pair_method="topk")
ranker.fit(X, y, qid=qid) ranker.fit(X, y, qid=qid[sorted_idx])
Please note that, as of writing, there's no learning-to-rank interface in scikit-learn. As a result, the :py:class:`xgboost.XGBRanker` class does not fully conform the scikit-learn estimator guideline and can not be directly used with some of its utility functions. For instances, the ``auc_score`` and ``ndcg_score`` in scikit-learn don't consider query group information nor the pairwise loss. Most of the metrics are implemented as part of XGBoost, but to use scikit-learn utilities like :py:func:`sklearn.model_selection.cross_validation`, we need to make some adjustments in order to pass the ``qid`` as an additional parameter for :py:meth:`xgboost.XGBRanker.score`. Given a data frame ``X`` (either pandas or cuDF), add the column ``qid`` as follows: Please note that, as of writing, there's no learning-to-rank interface in scikit-learn. As a result, the :py:class:`xgboost.XGBRanker` class does not fully conform the scikit-learn estimator guideline and can not be directly used with some of its utility functions. For instances, the ``auc_score`` and ``ndcg_score`` in scikit-learn don't consider query group information nor the pairwise loss. Most of the metrics are implemented as part of XGBoost, but to use scikit-learn utilities like :py:func:`sklearn.model_selection.cross_validation`, we need to make some adjustments in order to pass the ``qid`` as an additional parameter for :py:meth:`xgboost.XGBRanker.score`. Given a data frame ``X`` (either pandas or cuDF), add the column ``qid`` as follows:
.. code-block:: python .. code-block:: python
df = pd.DataFrame(X, columns=[str(i) for i in range(X.shape[1])]) df = pd.DataFrame(X, columns=[str(i) for i in range(X.shape[1])])
df["qid"] = qid df["qid"] = qid[sorted_idx]
ranker.fit(df, y) # No need to pass qid as a separate argument ranker.fit(df, y) # No need to pass qid as a separate argument
from sklearn.model_selection import StratifiedGroupKFold, cross_val_score from sklearn.model_selection import StratifiedGroupKFold, cross_val_score