Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Several aspects of the model I/O performance can be controlled at runtime by setting the PIO* parameters in env_run.xml. The model developer can start by tuning the number of PIO I/O tasks (processes that perform I/O) and the stride between the I/O tasks by setting PIO_NUMTASKS and PIO_STRIDE respectively in env_run.xml. By default the PIO_STRIDE is set to 4 and PIO_NUMTASKS is set to (total number of tasks/PIO_STRIDE), which means as you increase the number of tasks, you get more tasks trying to write to the filesystem.

You can get 'out of memory' errors in PIO if the number of tasks is too small. But if the number of tasks is too large, we also get 'out of memory errors' in the MPI library itself because there are too many messages. At the same time the I/O performance is better for large data sets with more I/O tasks.

Users have to typically start with the default configuration and try couple of PIO_NUMTASKS and PIO_STRIDE options to get an optimal PIO layout for a model configuration on a particular machine. On large machines, empirically we have found that we need to constrain the number of PIO I/O tasks (PIO_NUMTASKS) to less than 128 to prevent out of memory errors, but this is also system specific. The other rule of thumb is to not assign more than one PIO I/O task to each compute node, to minimize MPI overhead and to improve parallel I/O performance. This will occur naturally if PIO_NUMTASK is enough smaller than the total number of tasks.

Examples:

For small core counts, it's usually most efficient to run all the processes sequentially with all processes using all available cores (though if one process scales poorly it may make sense to have it run on fewer processors while the remaining processors idle).  Here is an example of a serial PE layout:

...