Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The following elements enable the envisioned General Process State Machine:

  • The Domain “Dataset Spec”: This global specification contains the static configuration information that “anchors” the process to a specific domain (e.g. “E3SM datasets”)

  • The Process “Transition Graph”: This global specification contains the transition rules that define the path(s) of conditional processing.

This file is read once by the state machine, and its elements are plied with a dataset’s “current status” to determine the next appropriate processing step. It generally consists of entries of the form

currProcess:currState: (leads to) nextProcess:nextState

  • The Per-Dataset “Status File”: These files record and detailed status of each dataset, and (coupled with the transition graph) control the state of processing to engage.

This file is maintained “with the dataset” (in the faceted dataset directory). It is an “append_only” object in terms of changes, thereby recording the timestamped history of processing applied to the dataset. The file consists generally of entries of the form

STAT:<timestamp>:<ProcessName>:<ProcessStatus>:<parameter-details>

Beyond just serving to check-point and condition the state of future processing, these files can be broadly surveyed to determine and report upon the status of the entire dataset warehouse (which datasets are at a particular stage of processing), and to study things like “How often was process X engaged” or “How much time was spent in a particular processing stage”, or “What fraction of time is spent per stage”, etc.

(work in progress)