This is a guide to ESGF publication for E3SM data. This guide is not meant to work for publication to CMIP6.
This guide assumes you have your model data ready.
Step 1: Data
...
Selection
The first step is to decide what data you want to publish. For the piControl run (the first large E3SM publication) we decided to publish the following files:
Atmosphere
The following were published for all the DECK experiments:
ATM: Native cam.h0
...
Native cam.
...
h1
...
Regridded climos at frequencies 5 year, 50 year, 100 year, total run
Regridded time series (
...
for selected variables*)
- Land
LND: Native clm2.h0
...
Native mosart.h0
- Ocean
OCN: Native mpaso.
...
- Sea-Ice
- Native mpascice.hist.am.timeSeriesStatsMonthly
timeSeriesStatsMonthly
ICE: Native mpassi.timeSeriesStatsMonthly
*Time series variables are: FLNS, FLNT, FLUT, FSNS, FSNTOA, FSNT, PRECC, PRECL, PRECSC, PRECSL, QFLX, SHFLX, THEFHT, TS
Once you've decided on what data to publish, generate the post processed data to meet your requirements (time series, regridded data, climos, ect).
ESGF requires that the data facets be stored in an "ini" file, which is used by the publisher to discover and validate the files. The E3SM ini file can be found here: /wiki/spaces/WORKFLOW/pages/650707592 and on GitHub: https://github.com/ESGF ini file/config/blob/devel/publisher-configs/ini/esg.e3sm.ini
The format for this file is fairly straight forward, the first section defines what the data facet options are, the second section implements the facet options, and the final section uses those facets to layout the directory format and dataset ID format.
...
Code Block |
---|
directory_format = %(root)s/%(source)s/%(model_version)s/%(experiment)s/%(atmos_grid_resolution)s_atm_%(ocean_grid_resolution)s_ocean/%(realm)s/%(regridding)s/%(data_type)s/%(time_frequency)s/%(ensemble_member)s dataset_id = %(source)s.%(model_version)s.%(experiment)s.%(atmos_grid_resolution)s_atm_%(ocean_grid_resolution)s_ocean.%(realm)s.%(regridding)s.%(data_type)s.%(time_frequency)s.%(ensemble_member)s |
========= BUG WARNING ===========
DO NOT include any option values that include any substrings that match a 'v' followed by any numbers. No 'fv129x256' or 'v1_0' for model version.
This will break version parser in esgprep and cause it to not match any of your data.
...
Step 2: Data formatting
Once you've defined that structure of the dataset, the next step is fairly straightforward. Simply create the directories in the structure you defined, and place the correct data into the structure where it should go.
...
For this step you will need the esgprep utility. You can either run ```pip install esgprep``` ```
pip install -e git://github.com/ESGF/esgf-prepare.git@master#egg=esgprep
``` or if you're running from any of the llnl machines it should be available by running ```source /usr/local/conda/bin/activate esgf-pub```
- First get the esgf ini files from the github repo. Run ```esgprep fetch-ini -i /path/to/my/ini/files``` Note that the path /path/to/my/ini/files must already exist.
- Next place your ini file with your specified facets into the directory with the name esg.e3sm.ini
- Next run the following command. I suggest using nohup or slurm since this process can take several hours.
Code Block |
---|
esgprepesgmapfile mapfilemake --map /path/to/where/you/want/your/mapfilesoutdir <path_to_put_mapfiles> -i /path/to/my/ini/files<path_to_ini_directory> --project e3sm --log /some/log/dir --max-threadsprocesses <your<number_maxof_cores> /path/to/your/data/<path_to_your_new_data_set> |
Step 4: Indexing and publication
- Next log into the datanode data node you're going to be publishing to. Make sure you have an openid account for this node, and that your account has the "publisher" attribute. Note that this server needs to have access to both the mapfiles as well as the data directories. If you staged the data on another server, you'll need to copy over in the correct structure to the esgf node you're publishing too.
- Ensure that /esg/config/esgcet/esg.e3sm.ini is correct and there is an entry in /esg/config/esgcet/esg.ini for e3sm in the projects table
- source the conda environment
- Store your myproxy credentials locally (this will store your credentials at ~/.globus/certificate-file and activate them for the next 72 hours)
Code Block |
---|
myproxy-logon -s <your_esgf_identity_node_hostname> -l <your_myproxy_username> -o ~/.globus/certificate-file -t 72 |
- If you are publishing an experiment for the first time run "esginitialize -c"
- Run the following commands in the given order
- esgpublish --project e3sm --map /path/to/where/you/want/your/mapfiles/<your_first_mapfile>.map --service fileservice
- esgpublish --project e3sm --map /path/to/where/you/want/your/mapfiles/<your_first_mapfile>.map --service fileservice --noscan --thredds
- esgpublish --project e3sm --map /path/to/where/you/want/your/mapfiles/<your_first_mapfile>.map --service fileservice --noscan --publish
Step 5:
...
Verification
Your data should now be available on the given node. You can verify by opening a browser window and going to https://<your_esgf_node>/esg-search/search?project=e3sm
...
Code Block |
---|
<dataset_id> | optional_key1=value1 | optional_key2=value2 | ect | ect |
For example the first round of publication used the following:
Code Block |
---|
E3SM.1_0.piControl.1deg_atm_60-30km_ocean.atmos.129x256.climo.monClim.ens1 | science_driver=Water Cycle | land_grid_resolution=1deg | seaice_grid_resolution=60-30km
E3SM.1_0.piControl.1deg_atm_60-30km_ocean.atmos.129x256.time-series.mon.ens1 | science_driver=Water Cycle | land_grid_resolution=1deg | seaice_grid_resolution=60-30km | period=Perpetual 1850
E3SM.1_0.piControl.1deg_atm_60-30km_ocean.land.native.model-output.mon.ens1 | science_driver=Water Cycle | land_grid_resolution=1deg | seaice_grid_resolution=60-30km | period=Perpetual 1850 |
After the new mapfile has been created, envoke the esgadd_facetvalues command
Code Block |
---|
esgadd_facetvalues --project e3sm --map /path/to/your/additional/facets/map --noscan --thredds --service fileservice |
And finally, publish the newly updated dataset
Code Block |
---|
esgpublish --project e3sm --map /path/to/your/map/files/ --noscan --publish --service fileservice |