...
The Domain “Dataset Spec”: This global specification contains the static configuration information that “anchors” the process to a specific domain (e.g. “E3SM datasets”). This document (/p/user_pub/e3sm/staging/resource/dataset_spec.yaml) details the metadata that defines each (E3SM) dataset, experiment, model version(s), resolutions, realms, grids, frequencies, etc. By “walking” the branches of this document, the complete list of E3SM dataset_ids (as reflected in the ESGF “master_id”) may be generated. Subsets of these dataset_ids are passed as tokens to those processes intended to operate upon the corresponding datasets.
The Process “Transition Graph”: This global specification contains the transition rules that define the path(s) of conditional processing.
This file is read once by the state machine, and its elements are plied with a dataset’s “current status” to determine the next appropriate processing step. It generally consists of entries of the form
currProcesscurrentProcess:currStatecurrentState: (leads to) nextProcess:nextState
The Per-Dataset “Status File”: These files record and detailed detail status of each dataset, and (coupled with the transition graph) control the state of processing to engage.
...
Beyond just serving to check-point and condition the state of future processing, these files can be broadly surveyed to determine and report upon the status of the entire dataset warehouse (which datasets are at a particular stage of processing), and to study things like “How often was process X engaged” or “How much time was spent in a particular processing stage”, or “What fraction of time is spent per stage”, etc.
For a detailed exposition, see: /wiki/spaces/EIDMG/pages/2907766794 (work in progress)
Operational State Machine
To install and operate the existing warehouse state machine (Validate, PostProcess, Publish), see: https://github.com/E3SM-Project/esgfpub/blob/master/docs/3_warehouse.md