Commit Graph

  • cc410b8c90 add local model in checkpoint interface, a new goal tqchen 2014-12-04 11:09:15 -08:00
  • 79e7862583 change note tqchen 2014-12-04 09:09:56 -08:00
  • f9d634ce06 change notes tqchen 2014-12-04 09:09:29 -08:00
  • 65a1cdf8e5 remove doc from main repo tqchen 2014-12-04 09:07:36 -08:00
  • 67229fd7a9 change model tqchen 2014-12-04 09:05:48 -08:00
  • 3033177e9e ok tqchen 2014-12-03 22:36:16 -08:00
  • 656a8fa3a2 ok tqchen 2014-12-03 22:32:30 -08:00
  • 0e9b64649a ok tqchen 2014-12-03 22:30:23 -08:00
  • 9da3c6c573 Merge branch 'master' of ssh://github.com/tqchen/rabit tqchen 2014-12-03 22:28:59 -08:00
  • 09a1305628 chg readme tqchen 2014-12-03 22:27:52 -08:00
  • 7d314fef78 open for writing nachocano 2014-12-03 21:58:58 -08:00
  • dece767084 Revert "open for writing" nachocano 2014-12-03 21:58:33 -08:00
  • 63bf9c7995 open for writing nachocano 2014-12-03 21:58:17 -08:00
  • 1c76483b4b ok tqchen 2014-12-03 21:53:34 -08:00
  • 9abe6ad4d8 checkin makefile tqchen 2014-12-03 21:30:11 -08:00
  • 8175df1002 bug fix in kmeans tqchen 2014-12-03 20:05:16 -08:00
  • a1a1a8895e add kmeans tqchen 2014-12-03 18:23:58 -08:00
  • 69af79d45d sparse kmeans tqchen 2014-12-03 18:15:28 -08:00
  • e3a95b2d1a Merge branch 'master' of https://github.com/tqchen/allreduce nachocano 2014-12-03 15:39:05 -08:00
  • 5c23b94069 updating kmeans based on Tianqi feedback. More efficient now nachocano 2014-12-03 15:38:58 -08:00
  • 85bb6cd027 Merge branch 'master' of ssh://github.com/tqchen/rabit tqchen 2014-12-03 15:13:09 -08:00
  • 90b9f1a98a add keepalive script tqchen 2014-12-03 15:04:30 -08:00
  • 55c2a5dc83 Merge branch 'master' of https://github.com/tqchen/allreduce nachocano 2014-12-03 14:21:42 -08:00
  • 1d0d5bb141 kmeans seems to be working.. not restarting anything though nachocano 2014-12-03 14:21:10 -08:00
  • 7a983a4079 add keepalive tqchen 2014-12-03 13:21:30 -08:00
  • 2523288509 basic recovery works tqchen 2014-12-03 12:19:08 -08:00
  • 8a6768763d bug fixed ver tqchen 2014-12-03 11:51:39 -08:00
  • a186f8c3aa ok tqchen 2014-12-03 11:19:43 -08:00
  • ceeb6f0690 bug version, check in and rollback tqchen 2014-12-03 11:17:39 -08:00
  • f3e5b6e13c ok tqchen 2014-12-03 10:00:47 -08:00
  • 34f2f887b1 add more broadcast and basic broadcast tqchen 2014-12-03 09:59:13 -08:00
  • 20b51cc9ce cleaner nachocano 2014-12-03 01:44:34 -08:00
  • 56aad86231 adding incomplete kmeans. I'm having a problem with the broadcast, and still need to implement the logic nachocano 2014-12-03 01:16:13 -08:00
  • ed1de6df80 change AllReduce to Allreduce tqchen 2014-12-02 21:11:48 -08:00
  • 8cb5b68cb6 Merge branch 'master' of https://github.com/tqchen/allreduce nachocano 2014-12-02 11:28:27 -08:00
  • e4abca9494 changing report folder to doc nachocano 2014-12-02 11:28:20 -08:00
  • 0a3300d773 rabit run on MPI tqchen 2014-12-02 11:20:19 -08:00
  • 2fab05c83e adding some design goals. nachocano 2014-12-02 11:07:07 -08:00
  • 40f7ee1cab adding simple image nachocano 2014-12-02 01:49:54 -08:00
  • 2c166d7a3a adding some initial skeleton of the report. nachocano 2014-12-02 01:19:36 -08:00
  • dcea64c838 check in model recover tqchen 2014-12-01 21:41:37 -08:00
  • 255218a2f3 change in interface, seems resetlink is still bad tqchen 2014-12-01 21:39:51 -08:00
  • b76cd5858c seems ok version tqchen 2014-12-01 20:18:25 -08:00
  • 46b5d46111 fix one bug, another comes tqchen 2014-12-01 19:53:41 -08:00
  • 993ff8bb91 find one bug, continue to next one tqchen 2014-12-01 19:34:27 -08:00
  • 2cde04867f Merge branch 'master' of ssh://github.com/tqchen/rabit tqchen 2014-12-01 16:57:33 -08:00
  • 337840d29b recover not yet working tqchen 2014-12-01 16:57:26 -08:00
  • fd2c57b8a4 Update engine_robust.cc Tianqi Chen 2014-12-01 15:32:57 -08:00
  • 1c5167d96e rabit seems ready to run tqchen 2014-12-01 10:32:30 -08:00
  • 0d63646015 Update README.md Tianqi Chen 2014-12-01 10:04:10 -08:00
  • b5367f48f6 Update README.md Tianqi Chen 2014-12-01 10:03:45 -08:00
  • 62c8ce9657 Update README.md Tianqi Chen 2014-12-01 10:03:31 -08:00
  • eb2ca06d67 fresh name fresh start tqchen 2014-12-01 09:17:05 -08:00
  • 16f729115e checkin allreduce recover tqchen 2014-11-30 22:41:04 -08:00
  • 9355f5faf2 more conservative exception watching tqchen 2014-11-30 21:39:22 -08:00
  • 8cef2086f5 smarter select for allreduce and bcast tqchen 2014-11-30 21:31:45 -08:00
  • f7928c68a3 next round try more careful select design tqchen 2014-11-30 21:07:34 -08:00
  • ecb09a23bc add recover data, do a round of review tqchen 2014-11-30 20:59:55 -08:00
  • b9b58a1275 bugfix in decide tqchen 2014-11-30 17:48:30 -08:00
  • 4a6c01c83c minor change in decide tqchen 2014-11-30 17:48:02 -08:00
  • 27f6f8ea9e bugfix in msg passing tqchen 2014-11-30 17:42:18 -08:00
  • d8d648549f finish message passing, do a review on msg passing and decide tqchen 2014-11-30 17:40:30 -08:00
  • 38cd595235 check in message passing tqchen 2014-11-30 16:38:47 -08:00
  • 7a60cb7f3e checkin decide request, todo message passing tqchen 2014-11-30 16:37:26 -08:00
  • 68f13cd739 tight tqchen 2014-11-30 11:46:21 -08:00
  • d1ce3c697c inline tqchen 2014-11-30 11:45:50 -08:00
  • 2e536eda29 check in the recover strategy tqchen 2014-11-30 11:42:59 -08:00
  • 155ed3a814 seems a OK version of reset, start to work on decide exec tqchen 2014-11-29 22:22:51 -08:00
  • 5b0bb53184 refactor code style, reset link still need thoughts tqchen 2014-11-29 20:15:27 -08:00
  • 42505f473d finish reset link log tqchen 2014-11-29 15:14:43 -08:00
  • 98756c068a livelock in oob send recv tqchen 2014-11-28 21:58:15 -08:00
  • aa54a038f2 livelock in oob send recv tqchen 2014-11-28 21:56:58 -08:00
  • a30075794b initial version of robust engine, add discard link, need more random mock test, next milestone will be recovery tqchen 2014-11-28 15:56:12 -08:00
  • a8128493c2 execute it like this: ./test.sh 4 4000 testcase0.conf ./ nachocano 2014-11-28 01:48:26 -08:00
  • faed8285cd execute it like ./test.sh 4 4000 testcase0.conf to obtain a successful execution nachocano 2014-11-28 00:16:35 -08:00
  • 21f3f3eec4 adding const to variable to comply with google code convention... may need to change more stuff though. Taint what else do you mean? Spaces, tabs, names? nachocano 2014-11-27 17:03:31 -08:00
  • 2f1ba40786 change in socket, to pass out error code tqchen 2014-11-27 16:17:07 -08:00
  • c565104491 adding some references to mock inside TEST preprocessor directive. It shouldn't be an assert because it shutdowns the process. Instead should check on the value and return some sort of error, so that we can recover. The mock contains queues, indexed by the rank of the process. For each node, you can configure the behavior you expect (success or failure for now) when you call any of the methods (AllReduce, Broadcast, LoadCheckPoint and CheckPoint)... If you call several times AllReduce, the outputs will pop from the queue, i.e., first you can retrieve a success, then a failure and so on. Pretty basic for now, need to tune it better nachocano 2014-11-26 17:24:29 -08:00
  • 54fcff189f dummy mock for now nachocano 2014-11-26 16:37:23 -08:00
  • 5ae99372d6 Update simple_dmatrix-inl.hpp Tianqi Chen 2014-11-26 09:13:49 -08:00
  • be5fb800d5 Merge pull request #112 from tfgit/master Tianqi Chen 2014-11-25 19:29:41 -08:00
  • baf41d589d Fixed README Ted Fujimoto 2014-11-25 22:17:36 -05:00
  • 8d7dbc65b3 Merge pull request #111 from tfgit/master Tianqi Chen 2014-11-25 19:12:42 -08:00
  • 198489438f Added OS X OpenMP instructions Ted Fujimoto 2014-11-25 21:42:13 -05:00
  • c356a0acc2 Remove tools folder Ted Fujimoto 2014-11-25 21:27:50 -05:00
  • d37f38c455 initial version of allreduce tqchen 2014-11-25 16:15:56 -08:00
  • 5e5bdda491 Initial commit Tianqi Chen 2014-11-25 14:37:18 -08:00
  • cdcfa5687a Update socket.h Tianqi Chen 2014-11-23 22:46:57 -08:00
  • f53be2884a ok tqchen 2014-11-23 22:42:44 -08:00
  • f805ecb5f3 fix a bug in node sindex set Tianqi Chen 2014-11-23 22:35:30 -08:00
  • 3e162ceda6 windows strange tqchen 2014-11-23 22:21:15 -08:00
  • 35bf2101fe seems a prob in win tqchen 2014-11-23 22:18:28 -08:00
  • fde580b08e fix windows run Tianqi Chen 2014-11-23 22:12:55 -08:00
  • 77ffd0465b ok tqchen 2014-11-23 21:36:22 -08:00
  • 78ca72b9c7 start work on win tqchen 2014-11-23 21:34:15 -08:00
  • d2f151ef5a bring it back alive again tqchen 2014-11-23 21:27:16 -08:00
  • 7f3dc967cf changes in socket, a bit work in linux side first Tianqi Chen 2014-11-23 21:21:52 -08:00
  • db2adb6885 start check windows compatiblity tqchen 2014-11-23 20:59:10 -08:00
  • 2e444f8338 remove warning from MSVC need another round of check Tianqi Chen 2014-11-23 20:52:13 -08:00
  • b55fe80350 add row map example tqchen 2014-11-23 18:15:42 -08:00