1 | The change from the "old" fermilab batch system
|
---|
2 | to condor was pretty straightforward, so I have
|
---|
3 | about 4 years experience now!
|
---|
4 |
|
---|
5 | The basic idea is that almost everything has to
|
---|
6 | be done remotely, but your beautiful code is
|
---|
7 | set up to do this nicely.
|
---|
8 |
|
---|
9 | You package everything up you need for each run --
|
---|
10 | which is just a few files. Here is a typical condor
|
---|
11 | job description file:
|
---|
12 |
|
---|
13 | universe = vanilla
|
---|
14 | Notification = Error
|
---|
15 | executable =
|
---|
16 | /farm/run2mc_stage01/condor/W2/SubProcesses/Pug_e+vedg/ajob8.sh
|
---|
17 | environment = F77=g77
|
---|
18 | transfer_output = true
|
---|
19 | transfer_error = true
|
---|
20 | transfer_executable = true
|
---|
21 | should_transfer_files = YES
|
---|
22 | transfer_input_files =
|
---|
23 | ajob8,input_app.txt,madevent,input-card.dat,cteq6l1.tbl,randinit
|
---|
24 | WhenToTransferOutput = ON_EXIT_OR_EVICT
|
---|
25 | input = /dev/null
|
---|
26 | output = $(Cluster).$(Process).out
|
---|
27 | error = $(Cluster).$(Process).err
|
---|
28 | log = $(Cluster).$(Process).log
|
---|
29 | queue
|
---|
30 |
|
---|
31 | After each step (survey, refine, refine), a bunch of files come
|
---|
32 | back into the appropriate directories. THEN, I run a
|
---|
33 | shell script locally:
|
---|
34 |
|
---|
35 | #!/usr/local/bin/bash
|
---|
36 |
|
---|
37 | ./bin/march
|
---|
38 | cd SubProcesses
|
---|
39 | ../bin/sumall
|
---|
40 | ../bin/combine_events
|
---|
41 | cd ..
|
---|
42 |
|
---|
43 |
|
---|
44 | "march" is a script I wrote that "marches" through all of the
|
---|
45 | directories, untars alot of tarballs, and basically prepares the
|
---|
46 | output for combine_events.
|
---|
47 |
|
---|
48 | Also, I found that it was more efficient to deal with
|
---|
49 | "gzipped" events.dat files. There is actually a library
|
---|
50 | out there that lets you READ gzipped files in fortran.
|
---|
51 |
|
---|
52 | As far as I can tell, there is no GOOD way to avoid the
|
---|
53 | local processing step between survey, refine, refine.
|
---|
54 |
|
---|
55 | People won't like that, but that is because they haven't
|
---|
56 | really thought about the problem.
|
---|
57 |
|
---|