diff --git a/doc/gpu/index.rst b/doc/gpu/index.rst
index 43a6a7601..0f5e6317a 100644
--- a/doc/gpu/index.rst
+++ b/doc/gpu/index.rst
@@ -50,7 +50,7 @@ Supported parameters
 +--------------------------------+----------------------------+--------------+
 | ``gpu_id``                     | |tick|                     | |tick|       |
 +--------------------------------+----------------------------+--------------+
-| ``n_gpus``                     | |cross|                    | |tick|       |
+| ``n_gpus`` (deprecated)        | |cross|                    | |tick|       |
 +--------------------------------+----------------------------+--------------+
 | ``predictor``                  | |tick|                     | |tick|       |
 +--------------------------------+----------------------------+--------------+
@@ -58,6 +58,8 @@ Supported parameters
 +--------------------------------+----------------------------+--------------+
 | ``monotone_constraints``       | |cross|                    | |tick|       |
 +--------------------------------+----------------------------+--------------+
+| ``interaction_constraints``    | |cross|                    | |tick|       |
++--------------------------------+----------------------------+--------------+
 | ``single_precision_histogram`` | |cross|                    | |tick|       |
 +--------------------------------+----------------------------+--------------+
 
@@ -65,7 +67,8 @@ GPU accelerated prediction is enabled by default for the above mentioned ``tree_
 
 The experimental parameter ``single_precision_histogram`` can be set to True to enable building histograms using single precision. This may improve speed, in particular on older architectures.
 
-The device ordinal can be selected using the ``gpu_id`` parameter, which defaults to 0.
+The device ordinal (which GPU to use if you have many of them) can be selected using the
+``gpu_id`` parameter, which defaults to 0 (the first device reported by CUDA runtime).
 
 
 The GPU algorithms currently work with CLI, Python and R packages. See :doc:`/build` for details.
@@ -80,15 +83,7 @@ The GPU algorithms currently work with CLI, Python and R packages. See :doc:`/bu
 
 Single Node Multi-GPU
 =====================
-.. note:: Single node multi-GPU training is deprecated. Please use distributed GPU training with one process per GPU.
-
-Multiple GPUs can be used with the ``gpu_hist`` tree method using the ``n_gpus`` parameter. which defaults to 1. If this is set to -1 all available GPUs will be used.  If ``gpu_id`` is specified as non-zero, the selected gpu devices will be from ``gpu_id`` to ``gpu_id+n_gpus``, please note that ``gpu_id+n_gpus`` must be less than or equal to the number of available GPUs on your system.  As with GPU vs. CPU, multi-GPU will not always be faster than a single GPU due to PCI bus bandwidth that can limit performance.
-
-.. note:: Enabling multi-GPU training
-
-  Default installation may not enable multi-GPU training. To use multiple GPUs, make sure to read :ref:`build_gpu_support`.
-XGBoost supports multi-GPU training on a single machine via specifying the `n_gpus' parameter.
-
+.. note:: Single node multi-GPU training with `n_gpus` parameter is deprecated after 0.90.  Please use distributed GPU training with one process per GPU.
 
 Multi-node Multi-GPU Training
 =============================
@@ -101,66 +96,64 @@ Objective functions
 ===================
 Most of the objective functions implemented in XGBoost can be run on GPU.  Following table shows current support status.
 
-.. |tick| unicode:: U+2714
-.. |cross| unicode:: U+2718
++--------------------+-------------+
+| Objectives         | GPU support |
++--------------------+-------------+
+| reg:squarederror   | |tick|      |
++--------------------+-------------+
+| reg:squaredlogerror| |tick|      |
++--------------------+-------------+
+| reg:logistic       | |tick|      |
++--------------------+-------------+
+| binary:logistic    | |tick|      |
++--------------------+-------------+
+| binary:logitraw    | |tick|      |
++--------------------+-------------+
+| binary:hinge       | |tick|      |
++--------------------+-------------+
+| count:poisson      | |tick|      |
++--------------------+-------------+
+| reg:gamma          | |tick|      |
++--------------------+-------------+
+| reg:tweedie        | |tick|      |
++--------------------+-------------+
+| multi:softmax      | |tick|      |
++--------------------+-------------+
+| multi:softprob     | |tick|      |
++--------------------+-------------+
+| survival:cox       | |cross|     |
++--------------------+-------------+
+| rank:pairwise      | |cross|     |
++--------------------+-------------+
+| rank:ndcg          | |cross|     |
++--------------------+-------------+
+| rank:map           | |cross|     |
++--------------------+-------------+
 
-+-----------------+-------------+
-| Objectives      | GPU support |
-+-----------------+-------------+
-| reg:squarederror| |tick|      |
-+-----------------+-------------+
-| reg:logistic    | |tick|      |
-+-----------------+-------------+
-| binary:logistic | |tick|      |
-+-----------------+-------------+
-| binary:logitraw | |tick|      |
-+-----------------+-------------+
-| binary:hinge    | |tick|      |
-+-----------------+-------------+
-| count:poisson   | |tick|      |
-+-----------------+-------------+
-| reg:gamma       | |tick|      |
-+-----------------+-------------+
-| reg:tweedie     | |tick|      |
-+-----------------+-------------+
-| multi:softmax   | |tick|      |
-+-----------------+-------------+
-| multi:softprob  | |tick|      |
-+-----------------+-------------+
-| survival:cox    | |cross|     |
-+-----------------+-------------+
-| rank:pairwise   | |cross|     |
-+-----------------+-------------+
-| rank:ndcg       | |cross|     |
-+-----------------+-------------+
-| rank:map        | |cross|     |
-+-----------------+-------------+
-
-For multi-gpu support, objective functions also honor the ``n_gpus`` parameter,
-which, by default is set to 1.  To disable running objectives on GPU, just set
-``n_gpus`` to 0.
+Objective will run on GPU if GPU updater (``gpu_hist``), otherwise they will run on CPU by
+default.  For unsupported objectives XGBoost will fall back to using CPU implementation by
+default.
 
 Metric functions
 ===================
 Following table shows current support status for evaluation metrics on the GPU.
 
-.. |tick| unicode:: U+2714
-.. |cross| unicode:: U+2718
-
 +-----------------+-------------+
 | Metric          | GPU Support |
 +=================+=============+
 | rmse            | |tick|      |
 +-----------------+-------------+
+| rmsle           | |tick|      |
++-----------------+-------------+
 | mae             | |tick|      |
 +-----------------+-------------+
 | logloss         | |tick|      |
 +-----------------+-------------+
 | error           | |tick|      |
 +-----------------+-------------+
-| merror          | |cross|     |
+| merror          | |tick|      |
 +-----------------+-------------+
-| mlogloss        | |cross|     |
+| mlogloss        | |tick|      |
 +-----------------+-------------+
 | auc             | |cross|     |
 +-----------------+-------------+
@@ -181,10 +174,8 @@ Following table shows current support status for evaluation metrics on the GPU.
 | tweedie-nloglik | |tick|      |
 +-----------------+-------------+
 
-As for objective functions, metrics honor the ``n_gpus`` parameter,
-which, by default is set to 1.  To disable running metrics on GPU, just set
-``n_gpus`` to 0.
-
+Similar to objective functions, default device for metrics is selected based on tree
+updater and predictor (which is selected based on tree updater).
 
 Benchmarks
 ==========
diff --git a/doc/tutorials/feature_interaction_constraint.rst b/doc/tutorials/feature_interaction_constraint.rst
index 947778427..ea4d252ca 100644
--- a/doc/tutorials/feature_interaction_constraint.rst
+++ b/doc/tutorials/feature_interaction_constraint.rst
@@ -171,7 +171,107 @@ parameter:
                                      num_boost_round = 1000, evals = evallist,
                                      early_stopping_rounds = 10)
 
-**Choice of tree construction algorithm**. To use feature interaction
-constraints, be sure to set the ``tree_method`` parameter to either ``exact``
-or ``hist``. Currently, GPU algorithms (``gpu_hist``, ``gpu_exact``) do not
-support feature interaction constraints.
+**Choice of tree construction algorithm**. To use feature interaction constraints, be sure
+to set the ``tree_method`` parameter to one of the following: ``exact``, ``hist`` or
+``gpu_hist``.  Support for ``gpu_hist`` is added after (excluding) version 0.90.
+
+
+**************
+Advanced topic
+**************
+
+The intuition behind interaction constraint is simple.  User have prior knowledge about
+relations between different features, and encode it as constraints during model
+construction.  But there are also some subtleties around specifying constraints.  Take
+constraint ``[[1, 2], [2, 3, 4]]`` as an example, the second feature appears in two
+different interaction sets ``[1, 2]`` and ``[2, 3, 4]``, so the union set of features
+allowed to interact with ``2`` is ``{1, 3, 4}``.  In following diagram, root splits at
+feature ``2``.  because all its descendants should be able to interact with it, so at the
+second layer all 4 features are legitimate split candidates for further splitting,
+disregarding specified constraint sets.
+
+.. plot::
+  :nofigs:
+
+  from graphviz import Source
+  source = r"""
+    digraph feature_interaction_illustration4 {
+      graph [fontname = "helvetica"];
+      node [fontname = "helvetica"];
+      edge [fontname = "helvetica"];
+      0 [label=<x<SUB><FONT POINT-SIZE="11">2</FONT></SUB>>, shape=box, color=black, fontcolor=black];
+      1 [label=<x<SUB><FONT POINT-SIZE="11">{1, 2, 3, 4}</FONT></SUB>>, shape=box];
+      2 [label=<x<SUB><FONT POINT-SIZE="11">{1, 2, 3, 4}</FONT></SUB>>, shape=box, color=black, fontcolor=black];
+      3 [label="...", shape=none];
+      4 [label="...", shape=none];
+      5 [label="...", shape=none];
+      6 [label="...", shape=none];
+      0 -> 1;
+      0 -> 2;
+      1 -> 3;
+      1 -> 4;
+      2 -> 5;
+      2 -> 6;
+    }
+  """
+  Source(source, format='png').render('../_static/feature_interaction_illustration4', view=False)
+  Source(source, format='svg').render('../_static/feature_interaction_illustration5', view=False)
+
+.. figure:: ../_static/feature_interaction_illustration4.png
+   :align: center
+   :figwidth: 80 %
+
+   ``{1, 2, 3, 4}`` represents the sets of legitimate split features.
+
+This has lead to some interesting implications of feature interaction constraints.  Take
+``[[0, 1], [0, 1, 2], [1, 2]]`` as another example.  Assuming we have only 3 available
+features in our training datasets for presentation purpose, careful readers might have
+found out that the above constraint is same with ``[0, 1, 2]``.  Since no matter which
+feature is chosen for split in root node, all its descendants have to include every
+feature as legitimate split candidates to avoid violating interaction constraints.
+
+For one last example, we use ``[[0, 1], [1, 3, 4]]`` and choose feature ``0`` as split for
+root node.  At the second layer of built tree, ``1`` is the only legitimate split
+candidate except for ``0`` itself, since they belong to the same constraint set.
+Following the grow path of our example tree below, the node at second layer splits at
+feature ``1``.  But due to the fact that ``1`` also belongs to second constraint set ``[1,
+3, 4]``, at third layer, we need to include all features as candidates to comply with its
+ascendants.
+
+.. plot::
+  :nofigs:
+
+  from graphviz import Source
+  source = r"""
+    digraph feature_interaction_illustration5 {
+      graph [fontname = "helvetica"];
+      node [fontname = "helvetica"];
+      edge [fontname = "helvetica"];
+      0 [label=<x<SUB><FONT POINT-SIZE="11">0</FONT></SUB>>, shape=box, color=black, fontcolor=black];
+      1 [label="...", shape=none];
+      2 [label=<x<SUB><FONT POINT-SIZE="11">1</FONT></SUB>>, shape=box, color=black, fontcolor=black];
+      3 [label=<x<SUB><FONT POINT-SIZE="11">{0, 1, 3, 4}</FONT></SUB>>, shape=box, color=black, fontcolor=black];
+      4 [label=<x<SUB><FONT POINT-SIZE="11">{0, 1, 3, 4}</FONT></SUB>>, shape=box, color=black, fontcolor=black];
+      5 [label="...", shape=none];
+      6 [label="...", shape=none];
+      7 [label="...", shape=none];
+      8 [label="...", shape=none];
+      0 -> 1;
+      0 -> 2;
+      2 -> 3;
+      2 -> 4;
+      3 -> 5;
+      3 -> 6;
+      4 -> 7;
+      4 -> 8;
+    }
+  """
+  Source(source, format='png').render('../_static/feature_interaction_illustration6', view=False)
+  Source(source, format='svg').render('../_static/feature_interaction_illustration7', view=False)
+
+
+.. figure:: ../_static/feature_interaction_illustration6.png
+   :align: center
+   :figwidth: 80 %
+
+   ``{0, 1, 3, 4}`` represents the sets of legitimate split features.
diff --git a/python-package/xgboost/dask.py b/python-package/xgboost/dask.py
index 18e496ffd..b0be2dbcc 100644
--- a/python-package/xgboost/dask.py
+++ b/python-package/xgboost/dask.py
@@ -101,24 +101,25 @@ def _run_with_rabit(rabit_args, func, *args):
 
 
 def run(client, func, *args):
-    """
-    Launch arbitrary function on dask workers. Workers are connected by rabit, allowing
-    distributed training. The environment variable OMP_NUM_THREADS is defined on each worker
-    according to dask - this means that calls to xgb.train() will use the threads allocated by
-    dask by default, unless the user overrides the nthread parameter.
+    """Launch arbitrary function on dask workers. Workers are connected by rabit,
+    allowing distributed training. The environment variable OMP_NUM_THREADS is
+    defined on each worker according to dask - this means that calls to
+    xgb.train() will use the threads allocated by dask by default, unless the
+    user overrides the nthread parameter.
 
-    Note: Windows platforms are not officially supported. Contributions are welcome here.
+    Note: Windows platforms are not officially
+      supported. Contributions are welcome here.
 
     :param client: Dask client representing the cluster
-    :param func: Python function to be executed by each worker. Typically contains xgboost
-    training code.
+    :param func: Python function to be executed by each worker. Typically
+       contains xgboost training code.
     :param args: Arguments to be forwarded to func
     :return: Dict containing the function return value for each worker
+
     """
     if platform.system() == 'Windows':
-        logging.warning(
-            'Windows is not officially supported for dask/xgboost integration. Contributions '
-            'welcome.')
+        logging.warning('Windows is not officially supported for dask/xgboost'
+                        'integration. Contributions welcome.')
     workers = list(client.scheduler_info()['workers'].keys())
     env = client.run(_start_tracker, len(workers), workers=[workers[0]])
     rabit_args = [('%s=%s' % item).encode() for item in env[workers[0]].items()]
diff --git a/python-package/xgboost/plotting.py b/python-package/xgboost/plotting.py
index d23f4860d..094bf8a51 100644
--- a/python-package/xgboost/plotting.py
+++ b/python-package/xgboost/plotting.py
@@ -184,18 +184,16 @@ def to_graphviz(booster, fmap='', num_trees=0, rankdir='UT',
     no_color : str, default '#FF0000'
         Edge color when doesn't meet the node condition.
     condition_node_params : dict (optional)
-        condition node configuration,
-        {'shape':'box',
-               'style':'filled,rounded',
-               'fillcolor':'#78bceb'
-        }
+      condition node configuration,
+      {'shape':'box',
+       'style':'filled,rounded',
+       'fillcolor':'#78bceb'}
 
     leaf_node_params : dict (optional)
         leaf node configuration
         {'shape':'box',
-               'style':'filled',
-               'fillcolor':'#e48038'
-        }
+         'style':'filled',
+         'fillcolor':'#e48038'}
 
     kwargs :
         Other keywords passed to graphviz graph_attr
diff --git a/python-package/xgboost/sklearn.py b/python-package/xgboost/sklearn.py
index 858129a4c..8d4f7d03f 100644
--- a/python-package/xgboost/sklearn.py
+++ b/python-package/xgboost/sklearn.py
@@ -105,8 +105,8 @@ class XGBModel(XGBModelBase):
         Value in the data which needs to be present as a missing value. If
         None, defaults to np.nan.
     importance_type: string, default "gain"
-        The feature importance type for the feature_importances_ property: either "gain",
-        "weight", "cover", "total_gain" or "total_cover".
+        The feature importance type for the feature_importances\\_ property:
+        either "gain", "weight", "cover", "total_gain" or "total_cover".
     \\*\\*kwargs : dict, optional
         Keyword arguments for XGBoost Booster object.  Full documentation of parameters can
         be found here: https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst.