Skim1 Recovery Procedures

This page was last updated June-07-99.



I. The Recovery Procedure

If the tape copy operator reported any problem copying a tape, then that tape was sent to the recovery czar, Jungwon Kwak. In many cases, the tape proved to be readable on the system used for recovery and the requisite number of copies were made. There are 4 classes of runs which required further work:
  1. If a tape was totally unreadable then it was regenerated at the FNAL skim1 facility using skim1_6. The output was written to a tape with the same label as the unreadable tape; the original tape was destroyed. See below for some details.
  2. On some tapes, most files were readable but a few files were not. In this case the readable files were copied to a new tape, with the same tape label as the input tape. The fwlist.db database was changed to describe the new contents of the tape.
  3. All files which could not be read in case 2) were reprocessed in the FNAL skim1 facility, with skim1_6. These runs were collected and written to new FW tapes, starting at FWx325. See below for some details.
  4. There were a handful of runs for which one or more super-streams never made it to one of the original FW tapes. These missing super-streams were treated as just another run for case 3).

    Details

    Consider run 11312, for which ss1 was not readable. When the run was originally processed at Vanderbilt it was done using skim1_5 on a Linux box. When it was reprocessed at Fermilab, it was done using skim1_6 on an AIX box. As a result the number of events in ss1 changed . Also, after the run was re-skim1'ed, only ss1 was kept; all other super-streams were discarded. This full history is recorded in skim1.db: the number of events for ss1 was changed to the new value but the numbers of events for all other super-streams were not changed. The file /edata3/skim1/srcs/skim1_db.txt describes additional fields which were added to document the different processing history of the different super-streams.

    Two reasons come to mind for which it may be necessary to keep track of which super-stream was processed with which version. The first is the, hopefully rare, case that one must mix two super streams in one analysis. The second is that one must take care when generating MC samples with a representative distribution of run numbers.



    II. Changes to the skim1.db and fwlist.db files

    1. mv skim1.db skim1_990531.db
      mv fwlist.db fwlist_990531.db
    2. For all runs which went through cases 1, 3 or 4 above, I changed the number of events in skim1.db, only for the reprocessed super-streams, to the number written by the reprocessing. I also added some new fields; see 5).
    3. For all tapes which were produced in cases 1, 3 or 4, I used Marco's existing scripts to generate a new record in fwlist.db. If the tape already existed, this overwrote the record for that tape.
    4. For all tapes which were produced in case 2, I edited fwlist.db to:
      • remove the runs which had been deleted from the tape
      • give the correct event and byte counts.
    5. Five new fields were appended to skim1.db These are described in /edata3/skim1/srcs/skim1_db.txt and that information is reproduced here. Field 19 is a flag to say that one or more of the super-streams in this run were reprocessed; the field is a bitmap for which the values (1,2,4,8,16,32) correspond to ( ss1, ss2, ss3, ss4, ss5, ss6 ). Any or all bits may be set. For runs with a non-zero field 19, fields 2, 4, 15, 18 may be different for the reprocessed and non-reprocessed super-streams. The information in these fields is repeated, but on a per super-stream basis, in the new fields 20 to 23. For example, if field 20 contains 5-5-5-6-5-5 it indicates that the original skim1 used skim1_5 but that ss4 was reprocessed using skim1_6. Similarly fields 21 to 23 describe the contents of fields 4, 15 and 18 on a per super-stream basis. For runs which have no reprocessed super-streams the additional fields are present and they simply duplicate the existing information.


    III. Access to Log and Oddpack files.

    1. The logfiles and oddpack files for the main skim1 processing are accessible on the web at the Skim1 Access Page.
    2. The logfiles for the FW tapes written during the recovery are also viewable via this infrastructure, as if they were an FW logfile from the main skim1 processing.
    3. I have added a column to the pages which describe the status of the skim1 processing of each run. This column is headed "RP", for ReProcessing. If the field is non-zero then it is a link to a new page containing summary information about which super-stream was processed where. That page also allows access to the skim1 reprocessing logfiles and oddpacks.
    4. This page seems like a handy place to document where the log files etc actually live.
      1. The files are named following the pattern:
        /edata3/skim1/logfile/06/6954.log
        /edata3/skim1/logfile/10/10040.log
        /edata3/skim1/oddpack/10/10040.odd.gz
        /edata3/skim1/logfile/ss1/FW0301.Log
      2. There are various uppercase/lower case conventions which denote the processing site:
        Vanderbilt = nnnnn.log, Colorado = nnnnn.LOG, Fermilab = nnnnn.Log.
        Vanderbilt = fwxxxx.log, Colorado = FWxxxx.LOG, Fermilab = FWxxxx.Log.
      3. If a file was processed several times during the main pass of skim1, then the older versions of the logfiles and oddpacks are named by adding a letter to the file name:
        /edata3/skim1/logfile/10/10040a.log
        /edata3/skim1/oddpack/10/10040a.odd.gz
        /edata3/skim1/logfile/ss1/FW0301a.Log
        If there are further reprocessings the added letter moves up the alphabet.
      4. The FW log files created by reprocessing follow the same naming conventions as above.
      5. The logfiles and oddpacks for the reprocessed runs are named by appending an uppercase R to the file type:
        /edata3/skim1/logfile/10/10837.LogR
        This odd convention allows reuse of the web browsing tools.

      Send comments to: Rob Kutschke