PASS1 Instructions
The duties of the Pass1 shift are the following:
Before taking your PASS1 shift, please do the following:
This section is intended to give a very basic introduction to the Fermilab Farms. It is by no means complete, but it is all the shift should need to know. More information is available, of course.
| FARM SYSTEM | NAME | HOST | APPROX. PROC. TIME |
|---|---|---|---|
| a | sgie831a | fnsfu.fnal.gov | 7-8 hours |
| b | sgie831b | fnsfu.fnal.gov | 7-8 hours |
| c | ibme831c | fnckm.fnal.gov | 6-7 hours |
| d | ibme831d | fnckm.fnal.gov | 6-7 hours |
| e | sgie831e | fnsfu.fnal.gov | 7-8 hours |
| f | sgie831f | fnio.fnal.gov | 4-8 hours |
| g | ibme831g | fnckm.fnal.gov | 7-8 hours |
| h | ibme831h | fnckm.fnal.gov | 7-8 hours |
Check if you need to submit a job on any of the farm systems. You can do this by checking the WEB page or by logging onto the farm host. In general you should keep five jobs running/queued on each system. Always make sure you have at least one job per queue which has not started to do anything. If you will not be checking the systems for an extended time like 8-10 hours (e.g. overnight) you should make sure there are at least five jobs on each system. To check the number of jobs you can do the following:
To submit a job on the farm you should
The table also shows the expected number of events. If there are runs missing between tapes you can click on the tape label to see if multiple runs were stored on the tape. This link will give you a better estimate of the number of events on the tape. Do not submit tapes with less than 800,000 events. Submitting short runs will disrupt the timing of the farms and cause CPU time to be wasted. E-mail a Pass1 Czar (when you e-mail your usual report) and ask for special instructions regarding these tapes.
Enter the username (e831p1) and password when prompted. If you make a mistake typing in the system or tape number type in the wrong password. (Please do not hit CTRL-C). If you type in the username or password incorrectly just try again.
See the section on staging below for one minor modification to the p1submit command usage.
If you find that you need to cancel a job, e.g. one submitted with the wrong tape or system number, issue the command:
If you must cancel a running job or one that is on-deck, let a Pass1 czar know that you did so as there may need to be special cleanup and accounting modifications. In general, the shift should never have to do this.
Jobs occassionally fail, or at least seem to. These are the jobs that appear with an E in the table of jobs over the last 36 hours. At least once per day, but not more than twice per day, you should send e-mail to the czars with a list of these runs. The Czars will invesitigate and tell you whether to resubmit these jobs or if they are OK. If the jobs are OK, the czars will modify the database so that the E no longer appears. This may take some time as the czars are often quite busy. Single failures are not considered critical to progress since there are thousands of tapes left to analyze.
Typically Pass1 czars take one week shifts in which they deal with these non-critical problems.
More infrequently something will happen that will either cause a queue to freeze or cause all the jobs in the queue to die in rapid succession. This is a situation which requires the immediate attention of the czars. You may try e-mail first, but if you don't hear anything back in a few minutes, you should call or page a czar. If there is any doubt as to whether a problem is serious or not, make sure you get in contact with a czar (not just the primary czar). See below for phone and pager numbers.
One exception to this division is Output Staging errors. If you notice one of these, e-mail the czars immediately, but you needn't call or page anyone.
| Name | Office (Work) | Home | Pager | |
|---|---|---|---|---|
| Irwin Gaines | gaines@fnal.gov | (630) 840-4022 | (630) 420-1452* | (888) 390-9193 |
| Jon Link | link@fnal.gov | (630) 840-2183 | (630) 584-9613 | (800) 241-9016* |
| Alberto Sánchez | asanchez@fnal.gov | +55 (21) 541 0337 x197 | +55 (21) 542 2602 |   |
| Eric Vaandering | ewv@fnal.gov | (303) 492-4821 | (303) 543-8924 | (800) 241-9165* |
| *Preferred after hours contact method | ||||
On the last day of your shift (Monday) you should do the following:
Summary of FOCUS Pass1 commands
Description of staging software