tqchen
87c7817124
add lazy check, need test, find a race condition
2015-01-14 11:58:43 -08:00
tqchen
348a1e7619
change default behavior to behave normal
2015-01-13 22:21:15 -08:00
tqchen
532575b752
ok
2015-01-13 14:41:37 -08:00
tqchen
3419cf9aa7
add auto caching of python in hadoop script, mock test module to python, with checkpt
2015-01-13 14:29:10 -08:00
tqchen
1b4921977f
update doc
2015-01-03 05:20:18 -08:00
tqchen
bfb9aa3d77
add native script
2014-12-30 04:37:50 -08:00
tqchen
d64d0ef1dc
cleanup submission script
2014-12-29 06:11:58 -08:00
tqchen
12399a1d42
add more mocktest
2014-12-21 17:59:12 -08:00
tqchen
e40047f9c2
new mock test
2014-12-20 18:38:54 -08:00
tqchen
925d014271
change file structure
2014-12-20 16:19:54 -08:00
tqchen
6151899ce2
add tracker print
2014-12-19 18:40:06 -08:00
tqchen
6bf282c6c2
isolate iserializable
2014-12-19 17:36:42 -08:00
tqchen
8c35cff02c
improve script
2014-12-19 04:21:16 -08:00
tqchen
9f42b78a18
improve tracker script
2014-12-19 04:20:45 -08:00
tqchen
1754fdbf4e
enable support for lambda preprocessing function, and c++11
2014-12-19 02:00:43 -08:00
tqchen
58331067f8
cleanup testcases
2014-12-18 23:50:59 -08:00
tqchen
c8faed0b54
pass local model recover test
2014-12-18 18:53:58 -08:00
tqchen
dbd05a65b5
nice fix, start check local check
2014-12-18 18:39:24 -08:00
tqchen
3f22596e3c
check in license
2014-12-09 20:57:54 -08:00
tqchen
2750679270
normal state running ok
2014-12-07 20:57:29 -08:00
nachocano
20b03e781c
to run all executables
2014-12-06 15:37:09 -08:00
nachocano
fcf2f0a03d
to stderr
2014-12-06 15:22:29 -08:00
nachocano
659b9cd517
changing number of repetitions
2014-12-06 15:14:14 -08:00
nachocano
9ed59e71f6
speed runner
2014-12-06 12:09:40 -08:00
nachocano
e0053c62e1
adding executable
2014-12-06 12:05:08 -08:00
nachocano
8f0d7d1d3e
changing to -ho not to conflict with help
2014-12-06 12:01:05 -08:00
nachocano
771891491c
Merge branch 'master' of https://github.com/tqchen/allreduce
2014-12-06 11:59:22 -08:00
nachocano
f203d13efc
speed runner
2014-12-06 11:59:16 -08:00
tqchen
4a7d84e861
chg string bcast
2014-12-06 11:25:08 -08:00
tqchen
1519f74f3c
ok
2014-12-06 11:20:52 -08:00
tqchen
0e012cb05e
add speed test
2014-12-06 11:05:24 -08:00
tqchen
19631ecef6
more tracker renaming
2014-12-06 09:24:12 -08:00
nachocano
bb7d6814a7
creating initial version of hadoop submit script. Not working.
...
Not sure how to get the master uri and port. I believe I cannot do it before I launch the job.
Updating the name from submit_job to submit_job_mpi
2014-12-05 03:27:02 -08:00
tqchen
90b9f1a98a
add keepalive script
2014-12-03 15:04:30 -08:00
tqchen
7a983a4079
add keepalive
2014-12-03 13:21:30 -08:00
tqchen
8a6768763d
bug fixed ver
2014-12-03 11:51:39 -08:00
tqchen
ed1de6df80
change AllReduce to Allreduce
2014-12-02 21:11:48 -08:00
tqchen
0a3300d773
rabit run on MPI
2014-12-02 11:20:19 -08:00
tqchen
dcea64c838
check in model recover
2014-12-01 21:41:37 -08:00
tqchen
255218a2f3
change in interface, seems resetlink is still bad
2014-12-01 21:39:51 -08:00
tqchen
b76cd5858c
seems ok version
2014-12-01 20:18:25 -08:00
tqchen
46b5d46111
fix one bug, another comes
2014-12-01 19:53:41 -08:00
tqchen
993ff8bb91
find one bug, continue to next one
2014-12-01 19:34:27 -08:00
tqchen
337840d29b
recover not yet working
2014-12-01 16:57:26 -08:00
tqchen
eb2ca06d67
fresh name fresh start
2014-12-01 09:17:05 -08:00
tqchen
8cef2086f5
smarter select for allreduce and bcast
2014-11-30 21:31:45 -08:00
tqchen
5b0bb53184
refactor code style, reset link still need thoughts
2014-11-29 20:15:27 -08:00
tqchen
42505f473d
finish reset link log
2014-11-29 15:14:43 -08:00
tqchen
a30075794b
initial version of robust engine, add discard link, need more random mock test, next milestone will be recovery
2014-11-28 15:56:12 -08:00
nachocano
a8128493c2
execute it like this: ./test.sh 4 4000 testcase0.conf ./
...
Now we are passing the folder where the round instances are saved.
The problem is that calling utils::Check or utils::Assert on 1 or 2 nodes, shutdowns all of them. Only those should be shutdown and this will work. There maybe some other mechanism to shutdown a particular node. Tianqi?
2014-11-28 01:48:26 -08:00