Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Presently, these questions cannot be answered except to hope that someone involved remembers. This is simply unworkable. We MUST have a uniform and consistent way to determine the precise state of any dataset, and to locate datasets that have qualified to a specific level of processing. To address this shortcoming, we are introducing the notion of “per dataset status files” for which compliant processing utilities will automatically record the process-point of each dataset.

[*] “specified datasets”: We lack even a consistently rational way to convey a collection of datasets. I challenge anyone to locate, in the archives or the warehouse, the set of “BGC CTC ssp585” datasets. NONE of the dataset IDs or facet paths include the terms “BGC” or “CTC”, and not all of the relevant experiments have “ssp585” in the name. Magic is required.

(2) Process Orchestration: We intend to process each dataset by assigning it a “process orchestrator” that can conduct each of the many and varied processing steps outlined above. This would not be possible except for the existence of machine-readable dataset status files, and a state transition graph detailing how to proceed for each dataset, given its status. We are engineering an orchestrator, capable of negotiating a path across any conditional sequence of processing operations that would read and update a dataset status file, and consult the appropriate transition graph to conduct dataset-specific processing.

This “orchestration scheme”, which we alternatively refer to as the “Warehouse State Machine” (or General Processing State Machine) is outlined here: Automated Management of Dataset Publication - A General Process State-Machine Approach[*] “specified datasets”: We lack even a consistently rational way to convey a collection of datasets. I challenge anyone to locate, in the archives or the warehouse, the set of “BGC CTC ssp585” datasets. NONE of the dataset IDs or facet paths include the terms “BGC” or “CTC”, and not all of the relevant experiments have “ssp585” in the name. Magic is required.

LLNL E3SM Archives

  • Data Location: /p/user_pub/e3sm/archives/<model>/<campaign>/<archive_directory>/

  • Config Location: /p/user_pub/e3sm/archives/.cfg/ (contains Archive_Locator, Archive_Map, and Standard_Dataset_Extraction_Patterns)

  • Operational details (Guides and Tools): See E3SM Long-Term Archive at LLNL

...

  • Location: /p/user_pub/e3sm/stagingwarehouse/prepubE3SM/<faceted_dataset_directory>/

  • Guides and Tools: . . .

...