GPU Plugin: Add subsample, colsample_bytree, colsample_bylevel (#1895)

2016-12-23 04:30:36 +13:00
parent cee4aafb93
commit b49b339183
10 changed files with 331 additions and 324 deletions
--- a/demo/gpu_acceleration/README.md
+++ b/demo/gpu_acceleration/README.md
@@ -9,10 +9,10 @@ https://www.kaggle.com/c/bosch-production-line-performance/data

 Copy train_numeric.csv into xgboost/demo/data.

-The subsample parameter can be changed so you can run the script first on a small portion of the data. Processing the entire dataset can take a long time and requires about 8GB of device memory. It is initially set to 0.4, using about 2650/3380MB on a GTX 970. 
+The subset parameter changes the proportion of rows loaded from the CSV file. Processing the entire dataset can take a long time and requires about 8GB of device memory. It is initially set to 0.4, using about 2650/3380MB on a GTX 970. Lower the parameter if your device runs out of memory.

 ```python
-subsample = 0.4
+subset = 0.4
 ```

 Parameters are set as usual except that we set silent to 0 to see how much memory is being allocated on the GPU and we change 'updater' to 'grow_gpu' to activate the GPU plugin.
--- a/demo/gpu_acceleration/bosch.py
+++ b/demo/gpu_acceleration/bosch.py
@@ -5,12 +5,12 @@ import time
 import random
 from sklearn.cross_validation import StratifiedKFold

-#For sub sampling rows from input file
+#For sampling rows from input file
 random_seed = 9
-subsample = 0.4
+subset = 0.4

 n_rows = 1183747;
-train_rows = int(n_rows * subsample)
+train_rows = int(n_rows * subset)
 random.seed(random_seed)
 skip = sorted(random.sample(xrange(1,n_rows + 1),n_rows-train_rows))
 data = pd.read_csv("../data/train_numeric.csv", index_col=0, dtype=np.float32, skiprows=skip)