Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The following elements enable the envisioned General Process State Machine:

  • The Domain “Dataset Spec”: This global specification contains the static configuration information that “anchors” the process to a specific domain (e.g. “E3SM datasets”). This document (/p/user_pub/e3sm/staging/resource/dataset_spec.yaml) details the metadata that defines each (E3SM) dataset, experiment, model version(s), resolutions, realms, grids, frequencies, etc. By “walking” the branches of this document, the complete list of E3SM dataset_ids (as reflected in the ESGF “master_id”) may be generated. Subsets of these dataset_ids are passed as tokens to those processes intended to operate upon the corresponding datasets.

  • The Process “Transition Graph”: This global specification contains the transition rules that define the path(s) of conditional processing.

This file is read once by the state machine, and its elements are plied with a dataset’s “current status” to determine the next appropriate processing step. It generally consists of entries of the form

currentProcess:currentState: (leads to) nextProcess:nextState

  • The Per-Dataset “Status File”: These files record and detailed detail status of each dataset, and (coupled with the transition graph) control the state of processing to engage.

This file is maintained “with the dataset” (in the faceted dataset directory). It is an “append_only” object in terms of changes, thereby recording the timestamped history of processing applied to the dataset. The file consists generally of entries of the form

STAT:<timestamp>:<ProcessName>:<ProcessStatus>:<parameter-details>

...

Beyond just serving to check-point and condition the state of future processing, these files can be broadly surveyed to determine and report upon the status of the entire dataset warehouse (which datasets are at a particular stage of processing), and to study things like “How often was process X engaged” or “How much time was spent in a particular processing stage”, or “What fraction of time is spent per stage”, etc.

For a detailed exposition, see: /wiki/spaces/EIDMG/pages/2907766794

Operational State Machine

To install and operate the existing warehouse state machine (Validate, PostProcess, Publish), see: https://github.com/E3SM-Project/esgfpub/blob/master/docs/3_warehouse.md