still need to test row merge

2014-11-19 11:44:24 -08:00
parent da54f5e5d8
commit 55e62a7120
4 changed files with 6 additions and 23 deletions
--- a/multi-node/col-split/README.md
+++ b/multi-node/col-split/README.md
@@ -1,14 +1,14 @@
-Column Split Version of XGBoost
+Distributed XGBoost: Column Split Version
 ====
 * run ```bash run-mushroom.sh```

-Steps to use column split version
+How to Use
 ====
 * First split the data by column, 
 * In the config, specify data file as containing a wildcard %d, where %d is the rank of the node, each node will load their part of data
 * Enable column split mode by ```dsplit=col```

-Note on the Column Split Version
+Notes
 ====
 * The code is multi-threaded, so you want to run one xgboost-mpi per node
 * The code will work correctly as long as union of each column subset is all the columns we are interested in.
--- a/multi-node/col-split/mushroom-col.sh
+++ b/multi-node/col-split/mushroom-col.sh
--- a/multi-node/col-split/run-mushroom.sh
+++ b/multi-node/col-split/run-mushroom.sh
@@ -1,19 +0,0 @@
-#!/bin/bash
-if [[ $# -ne 1 ]]
-then
-    echo "Usage: nprocess"
-    exit -1
-fi
-
-rm -rf train.col*
-k=$1
-
-# split the lib svm file into k subfiles
-python splitsvm.py ../../demo/data/agaricus.txt.train train $k
-
-# run xgboost mpi
-mpirun -n $k ../../xgboost-mpi  mushroom-col.conf updater=distcol silent=0
-
-# the model can be directly loaded by single machine xgboost solver, as usuall
-../../xgboost mushroom-col.conf task=dump model_in=0002.model fmap=../../demo/data/featmap.txt name_dump=dump.nice.$k.txt
-cat dump.nice.$k.txt