Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

This is a guide to ESGF publication for E3SM data. This guide is not meant to work for publication to CMIP6.


This guide assumes you have your model data ready.

Step 1: Data Facets

The first step is to decide what data you want to publish. For the piControl run (the first large E3SM publication) we decided to publish the following files:

  1. Atmosphere

  • Native cam.h0
  • Regridded cam.h0
  • Regridded climotologies
  • Regridded time series (of select variables)
  1. Land
  • Native clm2.h0
  • Regridded clm2.h0
  1. Ocean
  • Native mpaso.hist.am.timeSeriesStatsMonthly
  1. Sea-Ice
  • Native mpascice.hist.am.timeSeriesStatsMonthly


Once you've decided on what data to publish, generate the post processed data to meet your requirements (time series, regridded data, climos, ect).

The E3SM ini file can be found here: /wiki/spaces/WORKFLOW/pages/650707592 and on GitHub: https://github.com/ESGF/config/blob/devel/publisher-configs/ini/esg.e3sm.ini

The format for this file is fairly straight forward, the first section defines what the data facet options are, the second section implements the facet options, and the final section uses those facets to layout the directory format and dataset ID format.


For each section <category> laid out in the "categories" section, there should be a "<category>_options" section that defines what those options can be. Each option can have one or more comma separated value.

categories =
        project | string | false | true | 0
        experiment | enum | true | true | 1
        realm | enum | true | true | 2

experiment_options =
        e3sm, piControl, Pre-industrial Control

realm_options =
        atmos, land, ocean, sea-ice


The two most important values are the directory_format, and dataset_id options. Note that they can both include hardcoded strings to build directory names, as long as each options is in the format %(some_option)s. The %(root)s options is the path given to the esgprep command used later.

directory_format = %(root)s/%(source)s/%(model_version)s/%(experiment)s/%(atmos_grid_resolution)s_atm_%(ocean_grid_resolution)s_ocean/%(realm)s/%(regridding)s/%(data_type)s/%(time_frequency)s/%(ensemble_member)s
dataset_id = %(source)s.%(model_version)s.%(experiment)s.%(atmos_grid_resolution)s_atm_%(ocean_grid_resolution)s_ocean.%(realm)s.%(regridding)s.%(data_type)s.%(time_frequency)s.%(ensemble_member)s

========= BUG WARNING ===========

DO NOT include any option values that include any substrings that match a 'v' followed by any numbers. No 'fv129x256' or 'v1_0' for model version.

This will break version parser in esgprep and cause it to not match any of your data.

=================================

Step 2: Data formatting

Once you've defined that structure of the dataset, the next step is fairly straightforward. Simply create the directories in the structure you defined, and place the correct data into the structure where it should go.

For example, using the esg.e3sm.ini file linked above. Note that the representative file type for each leaf directory

$ tree --filelimit 5 /p/user_pub/work/E3SM/
1_0
└── piControl
    └── 1deg_atm_60-30km_ocean
        ├── atmos
        │   ├── 129x256
        │   │   ├── climo
        │   │   │   ├── monClim
        │   │   │   │   └── ens1
        │   │   │   │       └── v1 [20180129.DECKv1b_piControl.ne30_oEC.edison_01_000101_000501_climo.nc]
        │   │   │   └── seasonClim
        │   │   │       └── ens1
        │   │   │           └── v1 [20180129.DECKv1b_piControl.ne30_oEC.edison_01_AAN_climo.nc]
        │   │   ├── model-output
        │   │   │   └── mon
        │   │   │       └── ens1
        │   │   │           └── v1 [20180129.DECKv1b_piControl.ne30_oEC.edison.cam.h0.0001-01.nc]
        │   │   └── time-series
        │   │       └── mon
        │   │           └── ens1
        │   │               └── v1 [FSNTOA_000101_050012.nc]
        │   └── native
        │       └── model-output
        │           └── mon
        │               └── ens1
        │                   └── v1 [20180129.DECKv1b_piControl.ne30_oEC.edison.cam.h0.0001-01.nc]
        ├── land
        │   ├── 129x256
        │   │   └── model-output
        │   │       └── mon
        │   │           └── ens1
        │   │               └── v1 [20180129.DECKv1b_piControl.ne30_oEC.edison.clm2.h0.0001-01.nc]
        │   └── native
        │       └── model-output
        │           └── mon
        │               └── ens1
        │                   └── v1 [20180129.DECKv1b_piControl.ne30_oEC.edison.clm2.h0.0001-01.nc]
        ├── ocean
        │   └── native
        │       └── model-output
        │           └── mon
        │               └── ens1
        │                   └── v1 [mpaso.hist.am.timeSeriesStatsMonthly.0001-01-01.nc]
        └── sea-ice
            └── native
                └── model-output
                    └── mon
                        └── ens1
                            └── v1 [mpascice.hist.am.timeSeriesStatsMonthly.0001-01-01.nc]


Step 3: Mapfile generation

For this step you will need the esgprep utility. You can either run ```

pip install -e git://github.com/ESGF/esgf-prepare.git@master#egg=esgprep

``` or if you're running from any of the llnl machines it should be available by running ```source /usr/local/conda/bin/activate esgf-pub```

  1. First get the esgf ini files from the github repo. Run ```esgprep fetch-ini -i /path/to/my/ini/files``` Note that the path /path/to/my/ini/files must already exist.
  2. Next place your ini file with your specified facets into the directory with the name esg.e3sm.ini
  3. Next run the following command. I suggest using nohup or slurm since this process can take several hours.
esgmapfile make --outdir <path_to_your_output> -i <path_to_ini_directory> --project e3sm --max-processes <sum_number_of_cores> <path_to_your_new_data_set>



Step 4: Indexing and publication

  1. Next log into the data node you're going to be publishing to. Make sure you have an openid account for this node, and that your account has the "publisher" attribute. Note that this server needs to have access to both the mapfiles as well as the data directories. If you staged the data on another server, you'll need to copy over in the correct structure to the esgf node you're publishing too.
  2. Ensure that /esg/config/esgcet/esg.e3sm.ini is correct and there is an entry in /esg/config/esgcet/esg.ini for e3sm in the projects table
  3. source the conda environment
  4. Store your myproxy credentials locally (this will store your credentials at ~/.globus/certificate-file and activate them for the next 72 hours)
myproxy-logon -s <your_esgf_identity_node_hostname> -l <your_myproxy_username> -o ~/.globus/certificate-file -t 72


  1. If you are publishing an experiment for the first time run "esginitialize -c"
  2. Run the following commands in the given order
  • esgpublish --project e3sm --map /path/to/where/you/want/your/mapfiles/<your_first_mapfile>.map --service fileservice
  • esgpublish --project e3sm --map /path/to/where/you/want/your/mapfiles/<your_first_mapfile>.map --service fileservice --noscan --thredds
  • esgpublish --project e3sm --map /path/to/where/you/want/your/mapfiles/<your_first_mapfile>.map --service fileservice --noscan --publish

Step 5: Verification

Your data should now be available on the given node. You can verify by opening a browser window and going to https://<your_esgf_node>/esg-search/search?project=e3sm

This should give you an XML file with all the datasets with project=e3sm that are available on the given node. Check that your new dataset is listed.


Step 6: additional facets


Optional data facet values must be added after the initial publication step. A new mapfile must be generated, with one line per dataset in the format

<dataset_id> | optional_key1=value1 | optional_key2=value2 | ect | ect

For example the first round of publication used the following:

E3SM.1_0.piControl.1deg_atm_60-30km_ocean.atmos.129x256.climo.monClim.ens1 | science_driver="Water Cycle" | land_grid_resolution="1deg" | seaice_grid_resolution="60-30km"
E3SM.1_0.piControl.1deg_atm_60-30km_ocean.atmos.129x256.time-series.mon.ens1 | science_driver="Water Cycle" | land_grid_resolution="1deg" | seaice_grid_resolution="60-30km" | period="Perpetual 1850"
E3SM.1_0.piControl.1deg_atm_60-30km_ocean.land.native.model-output.mon.ens1 | science_driver="Water Cycle" | land_grid_resolution="1deg" | seaice_grid_resolution="60-30km" | period="Perpetual 1850"

After the new mapfile has been created, envoke the esgadd_facetvalues command

esgadd_facetvalues --project e3sm --map /path/to/your/additional/facets/map --noscan --thredds --service fileservice

And finally, publish the newly updated dataset

esgpublish --project e3sm --map /path/to/your/map/files/ --noscan --publish --service fileservice
  • No labels