How to Continue a Run in a Different Directory/Machine

Sometimes you need to continue a run you've started, but to change something. For example, maybe you want to add some new output variables. Or maybe the person who was doing the run went on vacation and you're supposed to continue their run. Or maybe you simply want to start your new simulation from a spun-up state. Or you are trying to efficiently reproduce a bug someone else reported. Or maybe the machine you were using is going down and you need to move your code somewhere else.

Doing a bit-for-bit restart with a new directory/machine/executable is called "branching" in CESM/ACME nomenclature. See the startup/branch/hybrid in CESM1.0 Documentation for a description. Note that one can also "clone" a case, which is like branching but preserves local modifications to the code base you're using (see how to clone in CESM1.x User's Guide). Since all good acme coders check all their changes into git before doing a run and because cloning is complicated, we shouldn't need to clone. This CESM documentation explains how to set up a branch run: branching in CESM2. Documentation. Below are 2 methods for continuing a run in a different directory or machine.

Continuing a run using NCAR's "branch" utility and the run_acme.csh script:

In order to create a branch run, you must have an already-completed run you want to branch from. This run must have run long enough to produce restart files and rpointer files. You can branch from this run using the "run_acme.csh" style script that Peter Caldwell and Philip Cameron-Smith created. Here are the steps for using run_acme.csh to do so:

  1. git clone https://github.com/ACME-Climate/SimulationScripts.git,
  2. cd SimulationScripts/templates/releases/
  3. modify run_acme.template.csh as needed
    1. replace "template" in name with better descriptor,
    2. copy file to non-git directory (if desired)
    1. change stuff in file as needed. I can't say what you will want to change since each simulation is different, but at least think about:
      1. change run_name
      2. Change run length/restart freq/output freq. By default the code you just downloaded runs 5 days with daily atm output, then stops without writing any restart files.
      1. change model_start_type to branch
      2. change restart_files_dir to the absolute path where the restart files for the run you want to branch from are located (note: restart_files_dir should include rpointer.* and *.r.* files.
  4. execute run_acme...csh and make sure it completes without errors and submits your job to the queue.

Continuing a run without using the "branch" utility:

NOTE: this method only works if the run name you're copying from is identical to the name of the run you're trying to branch to because the restart netcdf files have variables (like nhfil for cam.r and locfnh, locfnhr for clm.r) which contain the names of the files to use for restarting. I started modifying cp_continue_run.sh (described below) to fix these issues, but the result seems too brittle (i.e. dependent on strings having certain hardcoded lengths) to be worth the effort.

  1. Create a new case/build a new executable. I recommend using the "run_acme.csh" style script that Peter Caldwell and Philip Cameron-Smith created for this:
    1. git clone https://github.com/ACME-Climate/SimulationScripts.git,
    2. cd SimulationScripts/serving_as_a_template/recommended,
    3. modify run_acme.template.csh as needed
      1. replace "template" in name with better descriptor,
      2. copy file to non-git directory (if desired)
      3. change stuff in file as needed. I can't say what you will want to change since each simulation is different, but at least think about:
        1. change run_name
        2. in particular, change "submit_run" to false so you can set the code up for a restart before submitting it
        3. Change run length/restart freq/output freq. By default the code you just downloaded runs 5 days with daily atm output, then stops without writing any restart files.
    4. execute run_acme...csh and make sure it completes without errors
  2. Copy restart files from previous run to ${run_root_dir}/run where ${run_root_dir} is the root directory where your run will occur (which will have subdirectories called run, case_scripts, build, etc)
    1. I wrote a script called cp_continue_run.sh to do this (based on input by Chris Golaz). It is also in the git repo from step 1b under subdirectory SimulationScripts/serving_as_a_template/recommended/. In cp_continue_run.sh:
      1. change TIME1 to the day you want to restart from
      2. change TIME2 to the month before the day you want to restart on
      3. set the CASE to the name of the case you are trying to copy data from
      4. set CASE_OUT to the name you want to use for the run you're trying to submit (NOTE: switching run names is complicated because the old name is embedded inside the netcdf files used for the restart. cp_continue_run.sh can handle this, but its behavior may be somewhat brittle... If possible, set CASE_OUT=CASE. If problems occur, naming conflicts should be one of your first thoughts).
      5. INDIR is the absolute path to the restart files you want to copy
      6. OUTDIR is the absolute path to ${run_root_dir}/run.
    2. Execute cp_continue_run.sh and make sure the copied files exist in ${run_root_dir}/run as expected.
  3. Change each of the rpointer files in ${run_root_dir}/run to reflect a consistent desired start date (if they don't already)
  4. Change CONTINUE_RUN to TRUE in ${run_root_dir}/case_scripts/env_run.xml
  5. Submit the run by typing "./$CASE.submit" (where $CASE is the name of your case) in subdirectory ${run_root_dir}/case_scripts/