diff --git a/doc/parameter.rst b/doc/parameter.rst
index dd9784bab..1c1afd1c0 100644
--- a/doc/parameter.rst
+++ b/doc/parameter.rst
@@ -88,6 +88,17 @@ Parameters for Tree Booster
   - Subsample ratio of the training instances. Setting it to 0.5 means that XGBoost would randomly sample half of the training data prior to growing trees. and this will prevent overfitting. Subsampling will occur once in every boosting iteration.
   - range: (0,1]
 
+* ``sampling_method`` [default= ``uniform``]
+
+  - The method to use to sample the training instances.
+  - ``uniform``: each training instance has an equal probability of being selected. Typically set
+    ``subsample`` >= 0.5 for good results.
+  - ``gradient_based``: the selection probability for each training instance is proportional to the
+    *regularized absolute value* of gradients (more specifically, :math:`\sqrt{g^2+\lambda h^2}`).
+    ``subsample`` may be set to as low as 0.1 without loss of model accuracy. Note that this
+    sampling method is only supported when ``tree_method`` is set to ``gpu_hist``; other tree
+    methods only support ``uniform`` sampling.
+
 * ``colsample_bytree``, ``colsample_bylevel``, ``colsample_bynode`` [default=1]
 
   - This is a family of parameters for subsampling of columns.
diff --git a/doc/tutorials/external_memory.rst b/doc/tutorials/external_memory.rst
index b5427d127..03a7c8c78 100644
--- a/doc/tutorials/external_memory.rst
+++ b/doc/tutorials/external_memory.rst
@@ -1,6 +1,6 @@
-############################################
-Using XGBoost External Memory Version (beta)
-############################################
+#####################################
+Using XGBoost External Memory Version
+#####################################
 There is no big difference between using external memory version and in-memory version.
 The only difference is the filename format.
 
@@ -14,7 +14,13 @@ The ``filename`` is the normal path to libsvm format file you want to load in, a
 ``cacheprefix`` is a path to a cache file that XGBoost will use for caching preprocessed
 data in binary form.
 
-.. note:: External memory is also available with GPU algorithms (i.e. when ``tree_method`` is set to ``gpu_hist``)
+To load from csv files, use the following syntax:
+
+.. code-block:: none
+
+  filename.csv?format=csv&label_column=0#cacheprefix
+
+where ``label_column`` should point to the csv column acting as the label.
 
 To provide a simple example for illustration, extracting the code from
 `demo/guide-python/external_memory.py <https://github.com/dmlc/xgboost/blob/master/demo/guide-python/external_memory.py>`_. If
@@ -25,22 +31,26 @@ you have a dataset stored in a file similar to ``agaricus.txt.train`` with libSV
   dtrain = DMatrix('../data/agaricus.txt.train#dtrain.cache')
 
 XGBoost will first load ``agaricus.txt.train`` in, preprocess it, then write to a new file named
-``dtrain.cache`` as an on disk cache for storing preprocessed data in a internal binary format.  For
+``dtrain.cache`` as an on disk cache for storing preprocessed data in an internal binary format.  For
 more notes about text input formats, see :doc:`/tutorials/input_format`.
 
-.. code-block:: python
-
-  dtrain = xgb.DMatrix('../data/agaricus.txt.train#dtrain.cache')
-
 For CLI version, simply add the cache suffix, e.g. ``"../data/agaricus.txt.train#dtrain.cache"``.
 
-****************
-Performance Note
-****************
-* the parameter ``nthread`` should be set to number of **physical** cores
+***********
+GPU Version
+***********
+External memory is fully supported in GPU algorithms (i.e. when ``tree_method`` is set to ``gpu_hist``).
 
-  - Most modern CPUs use hyperthreading, which means a 4 core CPU may carry 8 threads
-  - Set ``nthread`` to be 4 for maximum performance in such case
+If you are still getting out-of-memory errors after enabling external memory, try subsampling the
+data to further reduce GPU memory usage:
+
+.. code-block:: python
+
+  param = {
+    ...
+    'subsample': 0.1,
+    'sampling_method': 'gradient_based',
+  }
 
 *******************
 Distributed Version
@@ -51,14 +61,12 @@ The external memory mode naturally works on distributed version, you can simply
 
   data = "hdfs://path-to-data/#dtrain.cache"
 
-XGBoost will cache the data to the local position. When you run on YARN, the current folder is temporal
+XGBoost will cache the data to the local position. When you run on YARN, the current folder is temporary
 so that you can directly use ``dtrain.cache`` to cache to current folder.
 
-**********
-Usage Note
-**********
-* This is an experimental version
-* Currently only importing from libsvm format is supported
+***********
+Limitations
+***********
+* The ``hist`` tree method hasn't been tested thoroughly with external memory support (see
+  `this issue <https://github.com/dmlc/xgboost/issues/4093>`_).
 * OSX is not tested.
-
-  - Contribution of ingestion from other common external memory data source is welcomed