diff options
| author | Guillaume Horel <guillaume.horel@serenitascapital.com> | 2015-11-04 12:30:44 -0500 |
|---|---|---|
| committer | Guillaume Horel <guillaume.horel@serenitascapital.com> | 2015-11-04 12:30:44 -0500 |
| commit | a5309fed914fdaa7697f2d369e7dcd02309063ab (patch) | |
| tree | 975bb588c4d9072ae1158ab670bf9fa851abd6f4 /README.more | |
| download | mrsync-a5309fed914fdaa7697f2d369e7dcd02309063ab.tar.gz | |
initial import
Diffstat (limited to 'README.more')
| -rw-r--r-- | README.more | 353 |
1 files changed, 353 insertions, 0 deletions
diff --git a/README.more b/README.more new file mode 100644 index 0000000..3a8f7e1 --- /dev/null +++ b/README.more @@ -0,0 +1,353 @@ + Copyright (C) 2008 Renaissance Technologies Corp. + main developer: HP Wei <hp@rentec.com> + Copyright (C) 2006 Renaissance Technologies Corp. + main developer: HP Wei <hp@rentec.com> + Copyright (C) 2005 Renaissance Technologies Corp. + Copyright (C) 2001 Renaissance Technologies Corp. + main developer: HP Wei <hp@rentec.com> + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; see the file COPYING. + If not, write to the Free Software Foundation, + 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + ++------------+ +| CHANGE LOG | ++------------+ +Jul - Oct 2008 -- version 4.0 + -- previously, missing page report is sent back one page + at a time. I change it to ~60K pages at a time. This + drastically reduces the network traffic. + -- clean up scheme in choosing a monitor machine. +Feb - May 2006 -- version 3.0 + verision 3.0.0 major update + -- large file support + -- platform independence (between linux, unix) + -- backup feature (as in rsync) + -- removing meta-file-info + We transmit the file_stat info right before + the file is being synced. And the file-stat info + is transfered by ascii format to avoid byte-ordering + translation. + The orignal meta_data is not necessay under this scheme. + -- catching slow machine as the feedback monitor + [ the signal_handler scheme in version 2.0 is + removed. ] + -- mcast options + version 3.0.[1-9] bug fixes + -- logic flaw which under certain condition + caused premature dropout due to + unsuccessful EOF, CLOSE_FILE + and caused unwanranted SIT-OUT cases. + -- tested on Debian 64-bit arch by Nicolas Marot in France + Nicolas also translated mrsync.py to mrsync.sh + (nicolas.marot@gmail.com) + version 3.1.0 + -- codes for IPv6 are ready (but not tested) + IPv4 is tested ok. + +June 9, 2005 -- version 2.1 + Clint Byrum at clint@careercast.com asks about the exit status + of multicaster, which is changed so that it exits whenever + there are bad_machines or files_sat_out. +May 2005 -- Second version + The major changes are: + (1) implementing congestion control so that mrsync is more + net-traffic-friendly. + (2) using python script as the glue to put everything together. + i.e. the previous mrsync.c is replaced by mrsync.py + Other changes collected over the years through interactions with + users include: + (1) adding options to change the default IP address for multicast, + and the default port number for flow control. + This was suggested by Robert Dack <robd@kelman.com> + (2) replacing memory mapped file IO with the usual + seek() and write() sequence. + This change was echoed by Clint Byrum <clint@careercast.com> + (3) adding verbose control so that by default mrsync prints + only essential info instead of detailed status report. + This was suggested by Clint Byrum <clint@careercast.com> + +Jan 2002 -- First version uploaded to http://freshmeat.net/projects/mrsync/ + The tar file is in ftp://felder.org/pub/mrsync.tar + ++-----------------+ +| MRSYNC vs RSYNC | ++-----------------+ +mrsync is a package that consists of utilities for transfering +a bunch of files from a master machine to multiple target +machines simultaneously +by using the multicasting capability in the UNIX system. +The name 'mrsync' is inspired by the +popular utility 'rsync' for synchronizing files between +two machines. However, beyond this similarity in the +functionality, mrsync is fundamentally different from rsync +in three areas. +(1) rsync uses TCP while mrsync needs UDP in order + to use the multicasting part of UNIX's socket communication. + The former limits the data commuinication to one-to-one-machine + whereas the latter allows one-to-many. + UDP has, however, no built in flow control. As a result, + the major part of mrsync + (more precisely, the multicaster and multicatcher), + is devoted to synchronizing the data flow. +(2) For a given file, + rsync transfers optionally only those parts in the file + that are different + in the two versions on the master and the target machine. + This saves time, esp on a slow network. + mrsync, in contrast, transfers the whole content of a file + to all targets at one time. The time gain of mrsync comes from + its concurrency as described in (3). +(3) Replicated data servers have become poplular to serve + thousands of clients. Using rsync to replicate the data + to hundreds of disks is very time consuming. The total + time it takes is proportional to the number of target + disks. In contrast, mrsync scales much better because + it puts the contents of the file on the network only one + time. e.g. we have used mrsync to transfer 140GB of data + to 100 machines in a 1Gbit LAN in about 4-6 hours. + +This project has started in 2001. We have been using the tool +every day. +Recently, I have upgraded this tool to handle large-files, +to make it platform independent, and to make it 64bit ready. +The current version is 4.0.0 which is stable. +It has been used by many people in the UNIX community. +Lately, people in France has helped me test the code on 64bit arch. +They are proposing to make it a standard package in the Debian distribution. + +mrsync was originally posted on Freshmeat.net. But the link there +points to one of our company's disks. We would like to find a place +on SourceForge.net for this package. + + + ++-------------------+ +| HISTORY OF MRSYNC | ++ ------------------+ +The project of mrsync stemmed from the necessity to transfer +many files to hundreds of machines running Linux at Renaissance +Technologies Corp. Looking into the Open Source Community, we found +a preliminary utility codes of multicasting written by Aaron Hillegass. +Many unsuccessful test-runs on a huge amount of data files, however, +led us to embark on an overhaul of the code. +Most of the following items were inherited and bug-fixed from +the original codes. +* The low level functions that + interact with UNIX's multicasting sockets. +* Meta_data -- the essential info about a file which the master + machine will first transmit to the target machines. + [ removed in 2006 (see above change log)]. +* Division of a file into many 'pages'. +* The idea of maintaining a missing page flag. +* The idea of a multicaster and multicatcher loop. + +In this mrsync, we develop two new critical elements: +flow-control message communication conducted by the multicaster, +and a four-state page reader (processor) in the multicatcher. +The former is to synchronize the task each machine is performing. +For example, the master will not start sending +the pages of a file unless all machines have acknowledged +the completion of openning the disk i/o for the file. +In order to accomodate these elements, the codes have been +changed significantly from the original version. +For example, the multicatcher now never asks for slowing down. +And multicaster sends data on a file-by-file basis. +The file integrity is achieved by orchestrating the +data flow which is closely monitored and conducted +by the master machine. + +[200505] we add congestion control into the code. +After the master sends one page for a file, it will not send +the next page until it receives the acknowledgement (ack) message +sent by a monitor target (a feedbacker if you will). +This simply-minded congestion control prevents sending pages +at a pace with which a busy network cannot digest. +The feedbacker is chosen at the initialization stage of multicaster. +It may be changed by multicaster whenever it fails to send back +ack message within a certain duration. + +[2006] The monitor machine will send back ack message +only after it has written the page to disk. This way the 'rtt' +calculated by the master machine includes the disk IO time, +instead of just the network-traffic time. The master also +selects the slowest target machine as the monitor machine. +These two ingredients provide more precise picture of +how busy the whole system is, and thus tend to leave no target +machine behind. + +As of today, mrsync has been in full use at Renaissance +on a regular basis. + ++-------------------+ +| TEST RUNNING TIME | ++-------------------+ +[200810] using version 4.0.0. The performance improves (if the target machines + are not busy). + Number of targets = 32. All linux machines on 1Gbit network. + +Total number of files = 283 Pages w/o ack = 0 ( 0.00%) +Total number of pages = 100129 Pages re-sent = 6852 ( 6.84%) +Total number of bytes = 6449407171 Bytes re-sent = 442030792 ( 6.85%) +Total time spent = 2.55 (min) ~ 0.37 (min/GB) + +Send pages time = 1.78 (min) ~ 0.26 (min/GB) + +rtt histogram +msec counts +---- -------- + 0 103415 + 1 1985 + 2 162 + 3 73 + 4 145 + 5 68 + 6 253 + 7 203 + 8 305 + 9 184 + 10 156 + +[200604] using version 3.x.x, here is an example of the final statistics + in one of our routine syncing jobs. + Number of targets = 32. All linux machines on 1Gbit network. + +total number of files = 4076 +Total number of pages = 480719 Pages re-sent = 39994 ( 8.32%) +Total number of bytes = 30875638130 Bytes re-sent = 2573924662 ( 8.34%) +Total time spent = 39.50 (min) ~ 1.18 (min/GB) + +rtt histogram +msec counts +---- -------- + 0 371023 + 1 118874 + 2 21977 + 3 3779 + 4 2226 + 5 760 + 6 584 + 7 580 + 8 288 + 9 162 + 10 92 + ... + +[200201] +25 minutes for a group of files whose total size amounts to 4.6GB. +(This data is obtained from running on 5 SUN machines + with Solaris 8 on an Ethernet LAN whose bandwidth is 100Mbits/sec.) + ++-------------------------------------+ +| MAIN ALGORITHM FOR SENDING ONE FILE | ++-------------------------------------+ +In the multicaster code running on the master machine: +In the initialization stage, the code selects one target machine +as a monitor machine (feedbacker), which sends back acknowledgement +message when one page is received. +The main loop for multicaster is as follows. +(1) Send OPEN_FILE command (message). +(2) Wait for all machine to send back acknowledgement (ack). +(3) If any machine does not ack back within a certain time period, + that machine is considered bad and is dropped out + of the monitoring list. + During that waiting time period, the master sends the + OPEN_FILE message to those potentially bad machines regularly. +(4) Start sending pages, + (all pages, in the first time round; + those missing pages reported back by target machines, in other rounds). + During this period, all target machines except the feedbacker, if activated, + just receive and process whatever pages that arrive at the receive buffer. + + There are two modes of operation for the master to send these pages. + (a) When the congestion control is turned off (-x option in mrsync.py) + the master sends out one page and wait for a duration (interpage interval), + which is specified as DT_PERPAGE in main.h, before it sends the next one. + (b) If the congestion control is turned on (the default behavior of mrsync.py), + the master will not send the next page until it receives the ack. + On the feedbacker, once a SIGIO is triggered indicating the arrival + of a page, it sends back ack message to the master. +(5) Send EOF message to signal the end of transmission + for this file. +(6) Wait for all machines to send back ack. + They either report ok + or report a list of page numbers that are missing. +(7) If any machine does not ack back within a certain time period, + that machine is considered bad and is dropped out + of the monitoring list. + During that waiting time period, the master sends the + EOF message to those potentially bad machines regularly. +(8) If there are missing pages, go back to (4). +(9) Send CLOSE_FILE message. +(10) Wait for all machine to send back ack. +(11) If any machine does not ack back within a certain time period, + that machine is considered bad and is dropped out + of the monitoring list. + During that waiting time period, the master sends the + CLOSE_FILE message to those potentially bad machines regularly. +(12) If there are more files to transfer, go back to (1). + + +In multicatcher running on the target machine: +the main loop performs the following steps, +(1) Start with IDLE_STATE. +(2) Upon receiving OPEN_FILE and being in IDLE_STATE, + set up a momory mapped temporary file whose size + equals that specified in the meta_data, + which has been received in prior time. + if things are not successful, + don't send back ack. + else + send back ack and enter GET_DATA_STATE. +(3) Upon receiving one UDP and being in GET_DATA_STATE, + store it into the right place in the temporary file. + Expect to receive more UDP data and go back to (3). +(4) Upon receiving EOF and being in GET_DATA_STATE, + if there are some missing pages, + report them and expect to go back to (3). + else there are no missing page, + send back ack and enter DATA_READY_STATE. + if there is sick conditions, + send back ack and enter SICK_STATE. +(5) Upon receiving CLOSE_FILE and being in DATA_READY_STATE, + if in DATA_READY_STATE, + rename(temporary_file, the_real_filename), + clean up the memory mapped area. + if things go well, + send back the ack, enter IDLE_STATE + and expect to go back to (1), + else don't send back ack. + else if in SICK_STATE, + send back sit_out ack, enter IDLE_STATE + and expect to go back to (1). + +In addition to the above main loop that runs on all target machines, +the selected monitor machine (feedbacker) needs to send back +an received_ack message for every page received and processed. + +-------------------------------------------------------------- +Note: The original version dealt with three types of 'files': + directory, softlink and regular file. + mrsync includes one more: hardlink file. + +HP Wei +hp@rentec.com +Renaissance Technologies Corp. +Nov 15, 2001 (first version) +May 20, 2005 (second version) +Apr 16, 2006 (third version) +Oct 28, 2008 (fourth version) + + + |
