Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

I was asked by Dave Norton of PGI to see if I can get an interactive debugger working with ACME. His request was pgdbg, but he also wanted to see if DDT would work. This page documents my efforts for this. I'm initially testing this with something small:

./create_newcase -compset FC5AV1C-L -case enable_debugger -res ne4_ne4 -project stf006 --output-root /lustre/atlas1/stf006/scratch/imn
cd enable_debugger
./xmlchange DEBUG=TRUE

Also, to have an error to actually debug, I set the PGI compiler to 16.10.0 in env_mach_specific.xml.

./case.setup
./case.build

I then changed the "run_exe" value from "${EXEROOT}/acme.exe" to "gdb ${EXEROOT}/acme.exe" to have it run the debugger in env_mach_specific.xml

EXEROOT

On Titan, the EXEROOT is important because while the executable may normally live in NFS file space, if the executable is an argument to a debugger, it must then live in Lustre space. The EXEROOT variable in CIME is currently defaulting to: $CIME_OUTPUT_ROOT/$CASE/bld, which is good because it means that one can change the location of the executable by specifying --output-root in the ./create_newcase command, which makes things much simpler.

Running Interactively

qsub -I -A stf006 -lwalltime=01:00:00,nodes=1 -q debug

Then, inside the job, cd to the case directory, then:

source .env_mach_specific.sh
./case.submit --no-batch

If you're using a visual debugger, this may be enough to work. If you're using a command-line debugger such as "gdb," we need to find a way to allow stdin to interact. You can avoid redirecting the output to file by removing the contents of "run_misc_suffix" in env_mach_specific.xml. However, I don't know how to enable interactive stdin input.

Allinea Forge DDT

I think the best way to try to run DDT on ACME is the reverse connect feature. Once you're inside the interactive job and have sourced .env_mach_specific.sh, also "module load forge". Then, in env_mach_specific.xml, change the aprun command to "ddt --connect aprun". Before you run ./case.submit --no-batch, make sure that you have a remote client that is already connected to Titan following the instructions of: https://www.olcf.ornl.gov/kb_articles/using-the-ddt-remote-client/. The path to DDT they provide in that webpage is outdated. Run "module show forge" to find the current path and use that when connecting. I've tested this method, and it worked for me.

I have a patch in CIME (an older version I think) that goes as follows:

diff --git a/cime/utils/python/CIME/aprun.py b/cime/utils/python/CIME/aprun.py
index a01d6fb..95acda3 100755
--- a/cime/utils/python/CIME/aprun.py
+++ b/cime/utils/python/CIME/aprun.py
@@ -64,7 +64,7 @@ def _get_aprun_cmd_for_case_impl(ntasks, nthreads, rootpes, pstrids,
 
     # Compute task and thread settings for batch commands
     tasks_per_node, task_count, thread_count, max_thread_count, total_node_count, aprun = \
-        0, 1, maxt[0], maxt[0], 0, "aprun"
+        0, 1, maxt[0], maxt[0], 0, ""
     for c1 in xrange(1, total_tasks):
         if maxt[c1] != thread_count:
             tasks_per_node = min(pes_per_node, max_tasks_per_node / thread_count)
diff --git a/cime/utils/python/CIME/case.py b/cime/utils/python/CIME/case.py
index 33d589e..b078349 100644
--- a/cime/utils/python/CIME/case.py
+++ b/cime/utils/python/CIME/case.py
@@ -1098,8 +1098,8 @@ class Case(object):
         executable, args = env_mach_specific.get_mpirun(self, mpi_attribs, job=job)
 
         # special case for aprun
-        if executable == "aprun":
-            return get_aprun_cmd_for_case(self, run_exe)[0] + " " + run_misc_suffix
+        if "aprun" in executable:
+            return executable + " " + get_aprun_cmd_for_case(self, run_exe)[0] + " " + run_misc_suffix
         else:
             mpi_arg_string = " ".join(args.values())

And with this patch, CIME correctly runs the ddt --connect aprun command with the correct aprun arguments.


  • No labels