Update document for model dump. (#5818)

* Clarify the relationship between dump and save.
* Mention the schema.
This commit is contained in:
Jiaming Yuan 2020-06-22 14:33:54 +08:00 committed by GitHub
parent 26143ad0b1
commit 8104f10328
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 38 additions and 30 deletions

View File

@ -112,7 +112,7 @@ configuration directly as a JSON string. In Python package:
print(config)
or
or in R:
.. code-block:: R
@ -158,22 +158,9 @@ Will print out something similiar to (not actual output as it's too long for dem
"colsample_bynode": "1",
"colsample_bytree": "1",
"default_direction": "learn",
"enable_feature_grouping": "0",
"eta": "0.300000012",
"gamma": "0",
"grow_policy": "depthwise",
"interaction_constraints": "",
"lambda": "1",
"learning_rate": "0.300000012",
"max_bin": "256",
"max_conflict_rate": "0",
"max_delta_step": "0",
"max_depth": "6",
"max_leaves": "0",
"max_search_group": "100",
"refresh_leaf": "1",
"sketch_eps": "0.0299999993",
"sketch_ratio": "2",
...
"subsample": "1"
}
}
@ -207,13 +194,16 @@ This way users can study the internal representation more closely. Please note
JSON generators make use of locale dependent floating point serialization methods, which
is not supported by XGBoost.
************
Future Plans
************
*************************************************
Difference between saving model and dumping model
*************************************************
Right now using the JSON format incurs longer serialisation time, we have been working on
optimizing the JSON implementation to close the gap between binary format and JSON format.
You can track the progress in `#5046 <https://github.com/dmlc/xgboost/pull/5046>`_.
XGBoost has a function called ``dump_model`` in Booster object, which lets you to export
the model in a readable format like ``text``, ``json`` or ``dot`` (graphviz). The primary
use case for it is for model interpretation or visualization, and is not supposed to be
loaded back to XGBoost. The JSON version has a `schema
<https://github.com/dmlc/xgboost/blob/master/doc/dump.schema>`_. See next section for
more info.
***********
JSON Schema
@ -229,3 +219,10 @@ leaf directly, instead it saves the weights as a separated array.
.. include:: ../model.schema
:code: json
************
Future Plans
************
Right now using the JSON format incurs longer serialisation time, we have been working on
optimizing the JSON implementation to close the gap between binary format and JSON format.

View File

@ -1444,8 +1444,11 @@ class Booster(object):
The model is saved in an XGBoost internal format which is universal
among the various XGBoost interfaces. Auxiliary attributes of the
Python Booster object (such as feature_names) will not be saved. To
preserve all attributes, pickle the Booster object.
Python Booster object (such as feature_names) will not be saved. See:
https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
for more info.
Parameters
----------
@ -1460,7 +1463,7 @@ class Booster(object):
raise TypeError("fname must be a string or os_PathLike")
def save_raw(self):
"""Save the model to a in memory buffer representation
"""Save the model to a in memory buffer representation instead of file.
Returns
-------
@ -1479,8 +1482,11 @@ class Booster(object):
The model is loaded from an XGBoost format which is universal among the
various XGBoost interfaces. Auxiliary attributes of the Python Booster
object (such as feature_names) will not be loaded. To preserve all
attributes, pickle the Booster object.
object (such as feature_names) will not be loaded. See:
https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
for more info.
Parameters
----------
@ -1503,7 +1509,9 @@ class Booster(object):
raise TypeError('Unknown file type: ', fname)
def dump_model(self, fout, fmap='', with_stats=False, dump_format="text"):
"""Dump model into a text or JSON file.
"""Dump model into a text or JSON file. Unlike `save_model`, the
output format is primarily used for visualization or interpretation,
hence it's more human readable but cannot be loaded back to XGBoost.
Parameters
----------
@ -1537,7 +1545,9 @@ class Booster(object):
fout.close()
def get_dump(self, fmap='', with_stats=False, dump_format="text"):
"""Returns the model dump as a list of strings.
"""Returns the model dump as a list of strings. Unlike `save_model`, the
output format is primarily used for visualization or interpretation,
hence it's more human readable but cannot be loaded back to XGBoost.
Parameters
----------
@ -1547,6 +1557,7 @@ class Booster(object):
Controls whether the split statistics are output.
dump_format : string, optional
Format of model dump. Can be 'text', 'json' or 'dot'.
"""
fmap = os_fspath(fmap)
length = c_bst_ulong()