diff --git a/doc/index.rst b/doc/index.rst
index b469d3ea0..80d15d1f8 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -22,6 +22,7 @@ Contents
   XGBoost User Forum <https://discuss.xgboost.ai>
   GPU support <gpu/index>
   parameter
+  prediction
   treemethod
   Python package <python/index>
   R package <R-package/index>
diff --git a/doc/prediction.rst b/doc/prediction.rst
new file mode 100644
index 000000000..dbfcc9cb8
--- /dev/null
+++ b/doc/prediction.rst
@@ -0,0 +1,148 @@
+.. _predict_api:
+
+##########
+Prediction
+##########
+
+There are a number of prediction functions in XGBoost with various parameters.  This
+document attempts to clarify some of confusions around prediction with a focus on the
+Python binding.
+
+******************
+Prediction Options
+******************
+
+There are a number of different prediction options for the
+:py:meth:`xgboost.Booster.predict` method, ranging from ``pred_contribs`` to
+``pred_leaf``.  The output shape depends on types of prediction.  Also for multi-class
+classification problem, XGBoost builds one tree for each class and the trees for each
+class are called a "group" of trees, so output dimension may change due to used model.
+After 1.4 release, we added a new parameter called ``strict_shape``, one can set it to
+``True`` to indicate a more restricted output is desired.  Assuming you are using
+:py:obj:`xgboost.Booster`, here is a list of possible returns:
+
+- When using normal prediction with ``strict_shape`` set to ``True``:
+
+  Output is a 2-dim array with first dimension as rows and second as groups.  For
+  regression/survival/ranking/binary classification this is equivalent to a column vector
+  with ``shape[1] == 1``.  But for multi-class with ``multi:softprob`` the number of
+  columns equals to number of classes.  If strict_shape is set to False then XGBoost might
+  output 1 or 2 dim array.
+
+- When using ``output_margin`` to avoid transformation and ``strict_shape`` is set to ``True``:
+
+  Similar to the previous case, output is a 2-dim array, except for that ``multi:softmax``
+  has equivalent output of ``multi:softprob`` due to dropped transformation.  If strict
+  shape is set to False then output can have 1 or 2 dim depending on used model.
+
+- When using ``preds_contribs`` with ``strict_shape`` set to ``True``:
+
+  Output is a 3-dim array, with ``(rows, groups, columns + 1)`` as shape.  Whether
+  ``approx_contribs`` is used does not change the output shape. If the strict shape
+  parameter is not set, it can be a 2 or 3 dimension array depending on whether
+  multi-class model is being used.
+
+- When using ``preds_interactions`` with ``strict_shape`` set to ``True``:
+
+  Output is a 4-dim array, with ``(rows, groups, columns + 1, columns + 1)`` as shape.
+  Like the predict contribution case, whether ``approx_contribs`` is used does not change
+  the output shape.  If strict shape is set to False, it can have 3 or 4 dims depending on
+  the underlying model.
+
+- When using ``pred_leaf`` with ``strict_shape`` set to ``True``:
+
+  Output is a 4-dim array with ``(n_samples, n_iterations, n_classes, n_trees_in_forest)``
+  as shape.  ``n_trees_in_forest`` is specified by the ``numb_parallel_tree`` during
+  training.  When strict shape is set to False, output is a 2-dim array with last 3 dims
+  concatenated into 1.  When using ``apply`` method in scikit learn interface, this is set
+  to False by default.
+
+
+Other than these prediction types, there's also a parameter called ``iteration_range``,
+which is similar to model slicing.  But instead of actually splitting up the model into
+multiple stacks, it simply returns the prediction formed by the trees within range.
+Number of trees created in each iteration eqauls to :math:`trees_i = num\_class \times
+num\_parallel\_tree`.  So if you are training a boosted random forest with size of 4, on
+the 3-class classification dataset, and want to use the first 2 iterations of trees for
+prediction, you need to provide ``iteration_range=(0, 2)``.  Then the first :math:`2
+\times 3 \times 4` trees will be used in this prediction.
+
+
+*********
+Predictor
+*********
+
+There are 2 predictors in XGBoost (3 if you have the one-api plugin enabled), namely
+``cpu_predictor`` and ``gpu_predictor``.  The default option is ``auto`` so that XGBoost
+can employ some heuristics for saving GPU memory during training.  They might have slight
+different outputs due to floating point errors.
+
+
+***********
+Base Margin
+***********
+
+There's a training parameter in XGBoost called ``base_score``, and a meta data for
+``DMatrix`` called ``base_margin`` (which can be set in ``fit`` method if you are using
+scikit-learn interface).  They specifies the global bias for boosted model.  If the latter
+is supplied then former is ignored.  ``base_margin`` can be used to train XGBoost model
+based on other models.  See demos on boosting from predictions.
+
+*****************
+Staged Prediction
+*****************
+
+Using the native interface with ``DMatrix``, prediction can be staged (or cached).  For
+example, one can first predict on the first 4 trees then run prediction on 8 trees.  After
+running the first prediction, result from first 4 trees are cached so when you run the
+prediction with 8 trees XGBoost can reuse the result from previous prediction.  The cache
+expires automatically upon next prediction, train or evaluation if the cached ``DMatrix``
+object is expired (like going out of scope and being collected by garbage collector in
+your language environment).
+
+*******************
+In-place Prediction
+*******************
+
+Traditionally XGBoost accepts only ``DMatrix`` for prediction, with wrappers like
+scikit-learn interface the construction happens internally.  We added support for in-place
+predict to bypass the construction of ``DMatrix``, which is slow and memory consuming.
+The new predict function has limited features but is often sufficient for simple inference
+tasks.  It accepts some commonly found data types in Python like :py:obj:`numpy.ndarray`,
+:py:obj:`scipy.sparse.csr_matrix` and :py:obj:`cudf.DataFrame` instead of
+:py:obj:`xgboost.DMatrix`.  You can call :py:meth:`xgboost.Booster.inplace_predict` to use
+it.  Be aware that the output of in-place prediction depends on input data type, when
+input is on GPU data output is :py:obj:`cupy.ndarray`, otherwise a :py:obj:`numpy.ndarray`
+is returned.
+
+****************
+Categorical Data
+****************
+
+Other than users performing encoding, XGBoost has experimental support for categorical
+data using ``gpu_hist`` and ``gpu_predictor``.  No special operation needs to be done on
+input test data since the information about categories is encoded into the model during
+training.
+
+*************
+Thread Safety
+*************
+
+After 1.4 release, all prediction functions including normal ``predict`` with various
+parameters like shap value computation and ``inplace_predict`` are thread safe when
+underlying booster is ``gbtree`` or ``dart``, which means as long as tree model is used,
+prediction itself should thread safe.  But the safety is only guaranteed with prediction.
+If one tries to train a model in one thread and provide prediction at the other using the
+same model the behaviour is undefined.  This happens easier than one might expect, for
+instance we might accidientally call ``clf.set_params()`` inside a predict function:
+
+.. code-block:: python
+
+    def predict_fn(clf: xgb.XGBClassifier, X):
+        X = preprocess(X)
+        clf.set_params(predictor="gpu_predictor")  # NOT safe!
+        clf.set_params(n_jobs=1)  # NOT safe!
+        return clf.predict_proba(X, iteration_range=(0, 10))
+
+    with ThreadPoolExecutor(max_workers=10) as e:
+        e.submit(predict_fn, ...)
diff --git a/python-package/xgboost/core.py b/python-package/xgboost/core.py
index 56a6e10b4..6f0634109 100644
--- a/python-package/xgboost/core.py
+++ b/python-package/xgboost/core.py
@@ -1616,12 +1616,11 @@ class Booster(object):
     ) -> np.ndarray:
         """Predict with data.
 
-          .. note:: This function is not thread safe except for ``gbtree`` booster.
+        .. note::
 
-          When using booster other than ``gbtree``, predict can only be called from one
-          thread.  If you want to run prediction using multiple thread, call
-          :py:meth:`xgboost.Booster.copy` to make copies of model object and then call
-          ``predict()``.
+            See `Prediction
+            <https://xgboost.readthedocs.io/en/latest/tutorials/prediction.html>`_
+            for issues like thread safety and a summary of outputs from this function.
 
         Parameters
         ----------
@@ -1665,8 +1664,11 @@ class Booster(object):
             feature_names are the same.
 
         training :
-            Whether the prediction value is used for training.  This can effect
-            `dart` booster, which performs dropouts during training iterations.
+            Whether the prediction value is used for training.  This can effect `dart`
+            booster, which performs dropouts during training iterations but use all trees
+            for inference. If you want to obtain result with dropouts, set this parameter
+            to `True`.  Also, the parameter is set to true when obtaining prediction for
+            custom objective function.
 
             .. versionadded:: 1.0.0
 
@@ -1686,12 +1688,6 @@ class Booster(object):
 
             .. versionadded:: 1.4.0
 
-        .. note:: Using ``predict()`` with DART booster
-
-          If the booster object is DART type, ``predict()`` will not perform
-          dropouts, i.e. all the trees will be evaluated.  If you want to
-          obtain result with dropouts, provide `training=True`.
-
         Returns
         -------
         prediction : numpy array
@@ -1916,11 +1912,9 @@ class Booster(object):
         The model is saved in an XGBoost internal format which is universal among the
         various XGBoost interfaces. Auxiliary attributes of the Python Booster object
         (such as feature_names) will not be saved when using binary format.  To save those
-        attributes, use JSON instead. See:
-
-          https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
-
-        for more info.
+        attributes, use JSON instead. See: `Model IO
+        <https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html>`_ for more
+        info.
 
         Parameters
         ----------
@@ -1956,11 +1950,9 @@ class Booster(object):
         The model is loaded from XGBoost format which is universal among the various
         XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as
         feature_names) will not be loaded when using binary format.  To save those
-        attributes, use JSON instead.  See:
-
-          https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
-
-        for more info.
+        attributes, use JSON instead.  See: `Model IO
+        <https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html>`_ for more
+        info.
 
         Parameters
         ----------