-
Notifications
You must be signed in to change notification settings - Fork 1
EventStore Administration
This page describes procedures used in administering an EventStore installation.
Each admin user should have configuration file in $HOME/.esdb.conf, which contains user:password entries and several other setting as illustrated below: <syntaxhighlight>
- sample .esdb.conf file
The users and passwords are used to control access to the master MySQL databases - management of SQLite databases is not access controlled. Note that the authentication is performed by the EventStore scripts themselves.
NOTE: authentication will change in upcoming versions
One of the inputs to EventStore is lists of events that are contained in various skims. Each file stores the event list for one skim. The file format used is called the IDXA file format, which simply consists of a header with the line "IDXA", and then triples of (run,event,uid), as in the following: <syntaxhighlight>
- example IDXA file
The mappings of event lists to skims are kept in files with entries of the form "skimname::eventlist". <syntaxhighlight>
- example skim mapping
The scripts for EventStore data management in GlueX are located (assuming the root EventStore directory is given by ESBASEDIR): <syntaxhighlight> $ESBASEDIR/src/AdminScripts </syntaxhighlight>
The batch scripts that drive the indexing and cataloging of data use several text files as inputs:
- data_location - this specifies the full path to the data files. The glob wildcard "*" can be used to [describe]
- eventstore_location - this directory is where the EventStore files (indices, sqlite DB's, log files) are stored. Nothing else should be stored in these directories - they are deleted whenever the injection script is run.
- idxa_location - this specifies the mapping between skim name and event list for the given run.
The main script for generating these files at JLab is located in misc/build_eventstore_inputs.py. Before you run the script, make sure that the run period and revision are properly set in the script itself, e.g. <syntaxhighlight> RUNPERIOD = "RunPeriod-2014-10" DATAREVISION = "ver10" </syntaxhighlight> By default, the script generates these files processes all available runs and overwrites any existing files. The script also supports running over a user-defined set of runs. For instance, if processing new runs 3500-3510, the following command line could be used: <syntaxhighlight> ./build_eventstore_inputs.py -b 3500 -e 3510 </syntaxhighlight>
The next step is to build the indexes for each skim and the metadata used by EventStore. The script that performs this is inject.csh. It takes one argument, the run number to be processed. Several variables need to be set for proper injection (here and in the following, REST file processing will be used as an example): <syntaxhighlight>
- example inject.csh settings
Notes:
- EVENTSTORE_OUTPUT_GRADE gives the grade that this run's data is being injected into. More discussion of the grade used in GlueX is given here. A writable grade must be specified.
- EVENTSTORE_WRITE_TIMESTAMP is an arbitrary timestamp associated with the data, of the form "YYYYMMDD". Conventionally it is the date when injection of a data set started, but this can easily change depending on the particulars of what you are doing.
- DATA_VERSION_NAME is the specific version name of the data set, as described here.
- EVENTSTORE_BASE_DIR is the location of the directory you prepared in the previous step.
A tool for building run lists is the script misc/build_runlist.py .
Note that the scripts for processing EVIO files were written with the JLab batch farm in mind, where the EVIO files are necessarily processed one at a time. The procedure for the administrator is the same as described above, but changes might need to be made if run at a different institution.
Once the EventStore information for individual runs has been created, we can merge the information for the processed runs into the master DB. The script that performs this is merge.sh.
<syntaxhighlight>- example merge.sh settings
Notes:
- MyESDir points to the directory where the sqlite files are, which is conventionally the same as the index files. The script uses find to build a list of the sqlite files. A gzipped tar archive of the sqlite files is made in the same directory as the sqlite files, in the case that merging fails.
- MyWorkDir is where several files related to the merging are kept. The number of any failed runs is written to a text file in this directory named failed.lst
- MasterDB points to the master database. A MySQL DB can be specified, as in the example above, or a SQLite master DB can be used by specifying a file name.
- The default behavior for this script is to search the MyESDir looking for sqlite files, and to merge in all the files it finds. If you only want to merge a specific list of runs, you can put the list into a text file, one run per line, and pass that as an argument to the merge script, e.g.: "merge.sh goodruns.txt"
- Run merge.sh
- Check $MyWorkDir/failed.lst
- Fix and iterate
Once all the data has been checked and the EventStore metadata created, injected, and merged into the main DB, the data version then can be moved to a readable grade for general use. The script that performs this action is moveGrade.sh. These variables must be properly set:
<syntaxhighlight>- example moveGrade.sh settings
Notes:
- MyDB should point to the master database that you merged into in the previous step.
- OldGrade is the grade you injected the data with, NewGrade is the final grade. For a more detailed discussion of the grades used by GlueX, see here.
- OldTime is the timestamp you injected the data with. NewTime is the timestamp that users will access the data with. Note that there does not have to be an particular relation between these times - NewTime can even be before OldTime, if you want. A classic trick used when processing a dataset incrementally (say, during data taking), is that each group of runs may have a different timestamp when injected into an -unchecked grade, and then moved to the same timestamp as all the rest of the runs.