Communicating between components

CanESM runs in a multiple-program, multiple-domain paradigm where each component (atmosphere, ocean, and coupler) run their own executables and communicate using MPI. All MPI tasks exist in the same MPI_COMM_WORLD. CanCPL itself is run on a single MPI task (and has no further parallelization within it). This page details how fields are sent between components and the transformations that occur. The overall MPI network topology is shown in the following figure.

_images/CanESM5-Communication.PNG

Demonstration of the MPI network topology used in CanESM. Each communicator contains the PEs for each task and the MPI_COMM_WORLD encompasses all PEs within the simulation. Communication between communicators only occurs between the lead task of the ocean/atmosphere and the coupler.

Coupler API

All three components compile com_cpl.F90 containing routines that initialize the MPI communicators on every task, create MPI communicators, define the interfaces to send and receive data via MPI (see the API reference for a full description), and setup the list of variables that will be passed through the coupler. The coupler API are only accessed in a handful of routines in each component:

AGCM

  • gcm18.F

  • mpi_getcpl2.F

  • mpi_putggb2.F

Ocean

  • cpl_cancpl.F90

  • sbccpl.F90

Organization of MPI communications

CanESM is run in a multi-program, multi-data (MPMD) paradigm where each component of the earth system (CanAM, CanCPL, and NEMO) has its own executable. Every MPI task exists within the default MPI_COMM_WORLD communicator. The parallelization strategies for CanAM and NEMO use MPI internally to exchange data between tasks. The calls are necessarily blocking and so the MPI_COMM_WORLD must be split properly to allow each component to run in parallel.

The definition of these communicators is done in the subroutine define_group in com_cpl.F90. During initialization each component calls this subroutine with a three-character identifier cpl, atm, or ocn, depending on which component it is. These character strings are then mapped onto pre-defined, integer parameters in cpl_types.F90 to avoid string comparisons. The integer parameters are then used to create the MPI communicators for each group. Additional information about each group is stored in the group_info array which contains the ‘leader’ of each group (the task with the lowest MPI rank in each group) and the ranks associated with each communicator. Some of this information is then broadcast from the leader of each group to ensure that every MPI task has the same information.

Exchanging coupled fields

The transfer of fields between communicators is handled only by the master tasks. These fields comprise the entire global array and not just the subdomain (in the ocean) or latitude bands (in the atmosphere) associated with an individual task. After receiving the field, the master task scatters the global array to all other tasks within the communicator.

The following describes the communication pathway using sea surface temperature as an example:

  1. The lead ocn task constructs the global SST array from the subdomain of every other ocn task.

  2. The lead ocn task sends SST to the main (and only) cpl task.

  3. The cpl task remaps SST from the ocean grid to the atmospheric grid

  4. The cpl task sends the global SST array to the master atm task

  5. The lead atm task scatters the global SST array to every other atm task.

  6. Each atmospheric task copies only the part of the global array that it needs

Example: Including a new component

The following example provides a qualitative description of how to connect a new component to the coupler. As a prerequisite, the following are assumed:

  • The component can be run in an MPMD like mode

  • The following locations to inject code have been identified:

    • Initialize the MPI communications

    • Send fields to the coupler

    • Receive fields from the coupler

Rough Procedure

  1. In the portion of the code responsible for initializing communications add the call to define_group found in com_cpl.F90. Note that this routine can be used to initialize MPI, but MPI can also have been initialized prior to that call.

  2. Define new variables in subroutine define_cpl_var_list in com_cpl.F90.

  3. Define new events that comprise the receive and send step between the coupler and the new component, e.g. create analogues of add_events_nemo_to_cpl and add_events_cpl_to_nemo.

  4. Add the calls to the add_events_COMPONENT_to_cpl and add_events_cpl_to_COMPONENT to subroutine add_events_part1

  5. Add a subroutine that reconstructs the global array from the subdomain arrays and calls send_data_rec

  6. Add a subroutine that calls rcv_data_rec and scatters the global array to each subdomain

Potential future enhancements

  • Eliminate the need for steps (4) and (5) in “Exchanging Coupling Fields” above by creating an intercommunicator between cpl and atm/ocn

  • Refactor coupler to support more tasks to enhance performance of the ESMF remapping

  • Avoid repeated code by refactoring the bcast routines

  • Generate mapping between processor domains to avoid full global gathers/scatters