Implement new save_raw in Python. (#7572)
* Expose the new C API function to Python. * Remove old document and helper script. * Small optimization to the `save_raw` and Json ctors.
This commit is contained in:
@@ -1,79 +0,0 @@
|
||||
'''This is a simple script that converts a pickled XGBoost
|
||||
Scikit-Learn interface object from 0.90 to a native model. Pickle
|
||||
format is not stable as it's a direct serialization of Python object.
|
||||
We advice not to use it when stability is needed.
|
||||
|
||||
'''
|
||||
import pickle
|
||||
import json
|
||||
import os
|
||||
import argparse
|
||||
import numpy as np
|
||||
import xgboost
|
||||
import warnings
|
||||
|
||||
|
||||
def save_label_encoder(le):
|
||||
'''Save the label encoder in XGBClassifier'''
|
||||
meta = dict()
|
||||
for k, v in le.__dict__.items():
|
||||
if isinstance(v, np.ndarray):
|
||||
meta[k] = v.tolist()
|
||||
else:
|
||||
meta[k] = v
|
||||
return meta
|
||||
|
||||
|
||||
def xgboost_skl_90to100(skl_model):
|
||||
'''Extract the model and related metadata in SKL model.'''
|
||||
model = {}
|
||||
with open(skl_model, 'rb') as fd:
|
||||
old = pickle.load(fd)
|
||||
if not isinstance(old, xgboost.XGBModel):
|
||||
raise TypeError(
|
||||
'The script only handes Scikit-Learn interface object')
|
||||
|
||||
# Save Scikit-Learn specific Python attributes into a JSON document.
|
||||
for k, v in old.__dict__.items():
|
||||
if k == '_le':
|
||||
model[k] = save_label_encoder(v)
|
||||
elif k == 'classes_':
|
||||
model[k] = v.tolist()
|
||||
elif k == '_Booster':
|
||||
continue
|
||||
else:
|
||||
try:
|
||||
json.dumps({k: v})
|
||||
model[k] = v
|
||||
except TypeError:
|
||||
warnings.warn(str(k) + ' is not saved in Scikit-Learn meta.')
|
||||
booster = old.get_booster()
|
||||
# Store the JSON serialization as an attribute
|
||||
booster.set_attr(scikit_learn=json.dumps(model))
|
||||
|
||||
# Save it into a native model.
|
||||
i = 0
|
||||
while True:
|
||||
path = 'xgboost_native_model_from_' + skl_model + '-' + str(i) + '.bin'
|
||||
if os.path.exists(path):
|
||||
i += 1
|
||||
continue
|
||||
booster.save_model(path)
|
||||
break
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
assert xgboost.__version__ != '1.0.0', ('Please use the XGBoost version'
|
||||
' that generates this pickle.')
|
||||
parser = argparse.ArgumentParser(
|
||||
description=('A simple script to convert pickle generated by'
|
||||
' XGBoost 0.90 to XGBoost 1.0.0 model (not pickle).'))
|
||||
parser.add_argument(
|
||||
'--old-pickle',
|
||||
type=str,
|
||||
help='Path to old pickle file of Scikit-Learn interface object. '
|
||||
'Will output a native model converted from this pickle file',
|
||||
required=True)
|
||||
args = parser.parse_args()
|
||||
|
||||
xgboost_skl_90to100(args.old_pickle)
|
||||
@@ -2,16 +2,18 @@
|
||||
Introduction to Model IO
|
||||
########################
|
||||
|
||||
In XGBoost 1.0.0, we introduced experimental support of using `JSON
|
||||
In XGBoost 1.0.0, we introduced support of using `JSON
|
||||
<https://www.json.org/json-en.html>`_ for saving/loading XGBoost models and related
|
||||
hyper-parameters for training, aiming to replace the old binary internal format with an
|
||||
open format that can be easily reused. The support for binary format will be continued in
|
||||
the future until JSON format is no-longer experimental and has satisfying performance.
|
||||
This tutorial aims to share some basic insights into the JSON serialisation method used in
|
||||
XGBoost. Without explicitly mentioned, the following sections assume you are using the
|
||||
JSON format, which can be enabled by providing the file name with ``.json`` as file
|
||||
extension when saving/loading model: ``booster.save_model('model.json')``. More details
|
||||
below.
|
||||
open format that can be easily reused. Later in XGBoost 1.6.0, additional support for
|
||||
`Universal Binary JSON <https://ubjson.org/>`__ is added as an optimization for more
|
||||
efficient model IO. They have the same document structure with different representations,
|
||||
and we will refer them collectively as the JSON format. This tutorial aims to share some
|
||||
basic insights into the JSON serialisation method used in XGBoost. Without explicitly
|
||||
mentioned, the following sections assume you are using the one of the 2 outputs formats,
|
||||
which can be enabled by providing the file name with ``.json`` (or ``.ubj`` for binary
|
||||
JSON) as file extension when saving/loading model: ``booster.save_model('model.json')``.
|
||||
More details below.
|
||||
|
||||
Before we get started, XGBoost is a gradient boosting library with focus on tree model,
|
||||
which means inside XGBoost, there are 2 distinct parts:
|
||||
@@ -53,7 +55,8 @@ Other language bindings are still working in progress.
|
||||
based serialisation methods.
|
||||
|
||||
To enable JSON format support for model IO (saving only the trees and objective), provide
|
||||
a filename with ``.json`` as file extension:
|
||||
a filename with ``.json`` or ``.ubj`` as file extension, the latter is the extension for
|
||||
`Universal Binary JSON <https://ubjson.org/>`__
|
||||
|
||||
.. code-block:: python
|
||||
:caption: Python
|
||||
@@ -65,7 +68,7 @@ a filename with ``.json`` as file extension:
|
||||
|
||||
xgb.save(bst, 'model_file_name.json')
|
||||
|
||||
While for memory snapshot, JSON is the default starting with xgboost 1.3.
|
||||
While for memory snapshot, UBJSON is the default starting with xgboost 1.6.
|
||||
|
||||
***************************************************************
|
||||
A note on backward compatibility of models and memory snapshots
|
||||
@@ -105,15 +108,10 @@ Loading pickled file from different version of XGBoost
|
||||
|
||||
As noted, pickled model is neither portable nor stable, but in some cases the pickled
|
||||
models are valuable. One way to restore it in the future is to load it back with that
|
||||
specific version of Python and XGBoost, export the model by calling `save_model`. To help
|
||||
easing the mitigation, we created a simple script for converting pickled XGBoost 0.90
|
||||
Scikit-Learn interface object to XGBoost 1.0.0 native model. Please note that the script
|
||||
suits simple use cases, and it's advised not to use pickle when stability is needed. It's
|
||||
located in ``xgboost/doc/python`` with the name ``convert_090to100.py``. See comments in
|
||||
the script for more details.
|
||||
specific version of Python and XGBoost, export the model by calling `save_model`.
|
||||
|
||||
A similar procedure may be used to recover the model persisted in an old RDS file. In R, you are
|
||||
able to install an older version of XGBoost using the ``remotes`` package:
|
||||
A similar procedure may be used to recover the model persisted in an old RDS file. In R,
|
||||
you are able to install an older version of XGBoost using the ``remotes`` package:
|
||||
|
||||
.. code-block:: r
|
||||
|
||||
@@ -244,10 +242,3 @@ leaf directly, instead it saves the weights as a separated array.
|
||||
|
||||
.. include:: ../model.schema
|
||||
:code: json
|
||||
|
||||
************
|
||||
Future Plans
|
||||
************
|
||||
|
||||
Right now using the JSON format incurs longer serialisation time, we have been working on
|
||||
optimizing the JSON implementation to close the gap between binary format and JSON format.
|
||||
|
||||
Reference in New Issue
Block a user