From a5309fed914fdaa7697f2d369e7dcd02309063ab Mon Sep 17 00:00:00 2001 From: Guillaume Horel Date: Wed, 4 Nov 2015 12:30:44 -0500 Subject: initial import --- README | 302 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 302 insertions(+) create mode 100644 README (limited to 'README') diff --git a/README b/README new file mode 100644 index 0000000..d665370 --- /dev/null +++ b/README @@ -0,0 +1,302 @@ + Copyright (C) 2008 Renaissance Technologies Corp. + main developer: HP Wei + Copyright (C) 2006 Renaissance Technologies Corp. + main developer: HP Wei + Copyright (C) 2005 Renaissance Technologies Corp. + Copyright (C) 2001 Renaissance Technologies Corp. + main developer: HP Wei + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; see the file COPYING. + If not, write to the Free Software Foundation, + 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +20060416 version 3.0 + This is a major update from previous version 2.1. New features include: + -- large file support + multicaster can now transfer file whose size is larger than 2**31-1. + -- platform independence (between linux, unix) + We have tested this independence from a Sun to a Linux machine. + -- catching slow machine as the feedback monitor dynamically + This significantly reduces the number of occurence of SIT-OUT-RECEIVING-FILE cases. + -- removing meta-file-info + Since our basic algorithm is different from that of Aaron's, the meta-file-info + is redundant for in our case. + -- backup feature (as in rsync) + If the target machines are NFS data servers, backup feature is absolutely + necessary. The feature is implemented in the low level multica* codes, + I will incorporate it into mrsync.py in the next month or so. + -- mcast IP address and port options + They are now tested and are working. + This allows multiple multicasting sessions running simultaneously. + -- code cleanup in both mrsync.py and *.c + -- code fixing for 64bit arch; + tested on Debian 64 bit arch by Nicolas Marot in France + -- logic flaw which under certain condition + caused premature dropout due to + unsuccessful EOF, CLOSE_FILE + and caused unwanranted SIT-OUT case + -- code provision for IPv6 (but not tested yet) +version 3.1.1 + -- minor bug fixes. + -- add checking in multicatcher to check if the + system is ready for write. + -- codes for IPv6 are ready (but not tested) + IPv4 is tested ok. +version 3.2.0 + -- monitor change improvement + -- handshake improvement (e.g. seq #) + -- if one machine skips a file, all will NOT close() +version 4.0.0 + -- previously, missing page report is sent back one page + at a time. I change it to ~60K pages at a time. This + drastically reduces the network traffic. + Performance (syncing time) is in general improved. + -- clean up scheme in choosing a monitor machine. + -- misc code cleaning up and minor adjustments. + +version 4.0.1 + -- Dave Close at ThalesGroup in CA points out + the type of c in "c = getopt()" should have been "int c" + [ It had been "char c". ] + +The mrsync package includes the following files: +-------------------- +README -- this file (usage example in the end.) +README.more -- another readme. +GPL.License -- legal stuff +mrsync.py -- the driver script that glues everything together +mrsync_config.py -- system dependent configuration files +cmdToTarget.py -- module to send command to remote machines +Makefile.Sun +Makefile.linux +backup.c +complaint_sender.c +complaints.c +file_operations.c +global.c +id_map.c +multicaster.c +multicatcher.c +page_reader.c +parse_synclist.c +rtt.c +rttcatcher.c +rttcomplaint_sender.c +rttcomplaints.c +rttmissings.c +rttpage_reader.c +rttsends.c +sends.c +set_catcher_mcast.c +set_mcast.c +setup_socket.c +signal.c +timing.c +trFilelist.c +main.h +rttproto.h +proto.h +signal.h +rttmain.h + +------------ +To install, +------------ +(1) copy these files to a directory on the target and + master machines. + (All target machines need at least a copy of the + multicatcher.) + Or the best (efficient) solution is to install the package + on a common file server to which all machines can access. +(2) Depending on which platform you are in, + mv Makefile.Sun (or MakeFile.linux) into Makefile. +(3) edit the system dependent section in Makefile. + Especially, the bindir variable. + It is used in the 'install' to copy the executable + binaries into the destination directory. +(4) make + to compile and generate the executables. (the warnings for format are harmless). +(5) make install + to copy the executables into $bindir. +(6) edit mrsync_config.py + change the paths for your system. + mrsync_config.py is read by mrsync.py +(7) check if your system has rsync installed + in the directory specified in rsyncPath and remote_rsyncPath + in the file mrsync_config.py +(8) check if your system has python installed. + Python, like perl, is included in most linux distribution. + As distributed, mrsync.py invokes python through the 1st line + !/usr/bin/env python + If this does not work for you system, you replace + the 1st line of mrsync.py with correct one. + +------------------------------ +Usage of mrsync.py +------------------------------ + +(1) the function of ping -- + This is removed from multicaster. + Instead, please use rtt which gives more useful info. + See 'Usage of rtt' below. + +(2) mrsync.py -m /dirX/target.list + -s /dirY/data + -t /dirZ/data + + This is a routine job carried out at Renaissance Technologies Corp. + The file target.list lists the names of all target machines (one name per line). + + Before carrying out multicasting, mrsync.py invokes rsync, + to find those files in the source directory (-s), + compared with the target directory (-t) in the first target machine, + that need to be transfered to all target machines. + If -t is not specified, then mrysnc assumes that the dest_path + on target machines is same as that in -s. + + mrsync.py uses rsync with the following minimum options: + --rsh=rsh -avW --dry-run --delete + This should work in most of routine cases. + This minumum option is specified in mrsync_config.py. + I did not test if other combination of options will work + so I would advise that you do not change the minimum option. + + However, you can add additional options on top of that minimum + options. Use '-o' in mrsync.py + e.g. + mrsync2.py -m list -s /path/sync/test -t /dest/sync/test -v 1 -o '-H' + This turns on --hard-link option in rsync. But hardlink is very tricky + to deal with. The minumum options supplied in mrsync_config.py is the one + that we find works pretty good in terms of dealing with hardlinks. + I do find that some rare sequence of syncing sessions requires that I repeat + another round of mrsync.py with additional + -o '-H' to set the hardlinks on the targets. + + The output of rsync is stored in a tmp file, which in turn gets + parsed by trFilelist. trFilelist translates the list of files into + a specific format for multicaster. + The reason for this two step process is to remove direct dependence of + multicaster on rsync. + The gain is that we can now do the case as shown in (3). + +(3) mrsync.py -m machine1,machine2 + -s /dirB/data/ + -l *.c,subs/foo*,sub2/path/[0-9][0-9].ext + + Here have two new things as compared with the case in (2). + First, the two target machines are listed directly + in -m option in the form of csv list + (instead of listing them in a machine_list file). + Second, the -l option is specified. + -l option lists the files that we want to be synced. + i.e. with this option, we bypass the rsync. + In this example, we want to sync + all /dirB/data/*.c files, + all /dirB/data/subs/foo* files, + and all /dirB/data/sub2/path/[0-9][0-9].ext + The pattern specified should be as those recognizable by 'ls'. + + Obviously, this usage is convenient for syncing small number of + specific files to a couple of machines. + +(4) mrsync.py -m /dirA/target_list + -s /dirB/data + -A 239.255.70.88 + -P 8010 + + Same functionality as in (2). + But here we specify the multicast IP address and port number + for use by both multicatcher and multicaster. + + This allows multiple mrsync sessions to run simultaneously + as long as each session uses its own mcast_address and port_number. + [ No overlap between sessions. ] + +(5) mrsync.py -m /dirA/target_list + -s /path + -X + Same functionality as in (2). + But (2) is a normal sync. This one invokes the -X. + In a normal sync, a given file (let's say /path/foo) is synced + with two steps, + (a) sync the file to /path/foo.tmp + (b) mv /path/foo.tmp /path/foo + With -X option, the same file is synced with three steps, + (a) rm /path/foo + (b) sync the file to /path/foo.tmp + (c) mv /path/foo.tmp /path/foo + This is useful when the target disk is almost full, and + could not afford to have foo.tmp and foo co-existing. + +(6) By default, mrsync.py prints out only very essential messages. + If you want to see more messages (related to some detailed status including + a rtt_histogram at the end of the session), + you can spcify '-v 1' in the mrsync.py command line argument list. + '-v 2' will print out even more detailed messages --- for debugging purpose! + +(7) other options of mrsync.py + type 'mrsync.py' and see them for yourself. + +-------------------------------------------------------------------------------- +Usage of rtt +---------------- +rtt is a utility for testing between master and one machine. +Its main function is to report the (histogram) distribution of the +round-trip-time interval for pages sent. +I use it to ping a machine +and to measure the typical rtt time and to see if the network is +connected properly. System administrators can use the rtt time info +to turn up the network. + +Example usage: + +(1) on master, I invoke rtt and it gives the report as follows. + +master:~ 1>rtt -r ssh -m target -n 1000 -s 64000 + +using ssh to invoke rttcatcher on target +Sun Apr 16 17:36:58 EDT 2006 +ssh target '/path/rttcatcher -a 239.255.67.200 -p 7900 1>/dev/null 2>/dev/null & echo $!' +remote pid = 27283 +Sending data... +Missing pages = 0 ( 0.00%) out of total pages = 1000 +rtt histogram +msec counts +---- -------- + 1 989 + 2 3 + 3 7 + 4 1 + + +The option -m specifies the name of the target machine. +-n specifies the number of pages that we want to send. +-s specifies the size of each page. (maximum = 64512) + +In this example, the total number of pages sent is 1000 +which is the sum of the counts in the rtt historgram table. +The histogram shows that the majority of rtt is around 1-2 msec. +If the network is really busy, the distribution of the histogram +will be broadened (instead of concentrated around one particular value). + +(2) more options for rtt. + type 'rtt' and see them on the screen. + +-------------------------------------------------------------------------------- + +HP Wei +hp@rentec.com +Renaissance Technologies Corp. + + -- cgit v1.2.3-70-g09d2