mpiTools -------- This repository contains tools which can be used in a batch system environment, on an SMP computer, or on any self-defined network of computers, and which eases the administration of running many jobs in parallel. The code is based on MPI and exploits the boss/worker (also known as master/slave) scheme. MPI allows to run the code on various batch systems, SMP computers with shared memory, all with a high scalability. This toolbox is continuously under development, and during course of time more tools might be added. Feel, however, free to modify the code according to your needs. The code has been tested with Open-MPI, MPICH, and MPICH2. Since the code has been written completely according to the MPI standards, we expect that it will also run smoothly using other implementations of MPI. The documentation of these tools are also available on http://panda-wiki.gsi.de/cgi-bin/viewauth/Computing/PandaRootTools Installation ------------ Prior to the installation of the mpiTools, make sure that MPI is installed on your system. You can download the code from http://www.open-mpi.org/ for OpenMPI, or http://www-unix.mcs.anl.gov/mpi/mpich1/ http://www.mcs.anl.gov/research/projects/mpich2/ for MPICH and MPICH2. For the installation of the mpiTools, one can make use of the Makefile in the root directory. Please set the environment variable "MPI" to the path which contains the mpi compilers. For example, export MPI=/opt/mpi/mpich-gcc/ make boss_worker_mpi It is advised to place the MPI variable in your profile, .bashrc, or .bash_login. Furthermore, we recommend to set the PATH and LD_LIBRARY_PATH to the corresponding MPI binary and library directory, respectively, in case it is not set by your system. For example: export PATH=$MPI/bin export LD_LIBRARY_PATH=$MPI/lib:$LD_LIBRARY_PATH A "make install_xxx" will try to copy the executables into the directory $INSTALLDIR/bin (xxx=bin), the scripts in $INSTALLDIR/scripts (xxx=scripts), the example macros in $INSTALLDIR/macros (xxx=macros), and example job-related files in $INSTALLDIR/jobs. The variable $INSTALLDIR can be set in the Makefile. A "make" without arguments will compile and link all programs and installs the binaries in $INSTALLDIR/bin. Source tree ----------- src - source files bin - executables jobs - job description files for PBS and for mpiTools macros - example (Panda)ROOT macros scripts - script files which are called by some of the mpiTools program boss_worker_mpi --------------- request job/result run ----------------- read -------- <=============== ---------- =======> ---------- thread ------------- | JOB DES. FILE |<===== | BOSS | send job | WORKER | result | SCRIPT | ======> | MOVE DATA | ----------------- -------- ===============> ---------- <======= ---------- ------------- <=============== ---------- | WORKER | ===============> ---------- ..... This is a generic boss worker tool. The mechanism is based on a worker/boss or master/slave model as illustrated in the diagram above. Jobs, defined in an ASCII database file (JOB DES. FILE), are distributed by a BOSS process who listens for a job request of the WORKERS. The worker carries out the job (SCRIPT) and sends the data (MOVE DATA) to the specified location. Report results are send back to the boss process. The MOVE process is a detached (pthread) process, and can, therefore, run in parallel with the SCRIPT process. The "MOVE" process reports back to the boss process whether the data transfer was successful or not (not shown in the diagram above). A job is considered to be successful if the SCRIPT AND MOVE processes return back a positive response to the boss process. The core engine, boss_worker_mpi, is setup in a very generic way and can therefore be used for any type of job with an excellent load balance and scalable to any multi-computer/cpu system. In essence it is a parallelized wrapper for your programs (setup in a scripts). This tool is composed of bin/boss_worker_mpi - executable which contains the MPI protocols for boss/worker model src/boss_worker_mpi.c - the corresponding source code src/killprocesstree.c - tool derived from original pstree program and used to kill time-out'd processes jobs/run_boss_worker_mpi.job - example job description input to PBS jobs/do_boss_worker_mpi.sh - script which performs a qsub with as input run_boss_worker_mpi.job jobs/jobs.in - job description file for boss_worker_mpi code (see below) scripts/... - various example bash scripts which call your programs or macros (see below) macros/... - example ROOT macros used for PandaRoot simulations bin/movefiles - script which defines the copying method used to transfer locally stored data of the workers to a storage place as defined by the user in "jobs.in" bin/runmpi - script which helps you to start a MPI boss-worker session To run the program "bin/boss_worker_mpi" one can specify the following options: -? --- Overview of available options -j --- Job description path+filename (default=./jobs/jobs.in) -s --- Path to the local storage place used by workers (default=/tmp/) -b --- Minimum required buffer storage space (default=512 Mb) -p --- Maximum allowed pending jobs for a worker (default=4) -l --- Maximum allowed load of a worker (default=100.00) -m --- Path and name of the copying/moving tool (default=~/bin/movefiles) -t --- Time out period for script and move process to finish (default=3600 secs) -r --- Starting run identification number (default=