The parallelization of the code has been achieved in modifying
the temporal loop to take into account the data domain
decomposition. Fig. 3 shows which routines of code have been
Three types of routines can be distinguished: routines to communicate fields between processors, routines allowing the treatment of physical boundary conditions and routines doing global operations. Others communications are required for distributed I/O and pressure solver and are described in other sections.
These exchanges are necessary to update the halo of all subdomains (routine UPDATE_HALO_ll) and to also update the second layer of the halo (routine UPDATE_HALO2). Communications concerning the second layer of the halo are connected to the 4th order numerical diffusion.
The following routines have been modified to include resfreshemnt of the halo:
Modifications in some Meso-NH routines have been brought to allow to identify the location of physical boundaries. The ComLib routines LNORTH_ll, LWEST_ll, LSOUTH_ll and LEAST_ll are called in Meso-NH routines ANTI_DIFF, BOUNDARIES, DFLUX_CORR, MPDATA, NUM_DIFF, PHYS_PARAM$n, RAD_BOUND, and TURB. In addition the routine GET_INTERSECTION_ll is called in RELAXATION to allow to know the relative location of the relaxation region.
Specific global functions (MIN_ll and SUM3D_ll) have been thus
RESOLVED_CLOUD routine. Other global operations are made in the Meso-NH model in the statistical analysis of Meso-NH (BUDGET and LES_SPECTRA parts) and will be further modified.
The LB fields for relaxation along the y-axis are constitued of two
slices along the South and North borders (Fig.4a). In the
non-parallellized version of the model, the two slices are stored in a
single array of dimension (IIU_ll,2*NRIMY +2,:); the southern slice is
stored in the section (1:IIU_ll,1:NRIMY+1,:) and the northern slice, in
(1:IIU_ll,NRIMY+2:2*NRIMY +2,:). NRIMY is the depth of the riming zone. In the FM-files, the two slices of the LB fields can be larger than that is needed in the model. The LB arrays are of dimension (1:IIU;2*ILBSIZEY+2,:) in the FM-file (see Fig.4a).
For the LB fields of variables located at a mass grid point or a u-grid
point, the two slices correspond to the areas (1:IIU_ll,1:1+NRIMY)
and (1:IIU_ll, IJU_ll - NRIMY: IJU_ll ) on the global domain
reference (Fig.4b) . For the LB fields of variables located
at a v-grid point, the two slices correspond to the areas
(1:IIU_ll, IJU_ll - NRIMY:IJU_ll ) on the global domain. If there is no relaxation, the LB fields are stored in a single array of dimension (IIU_ll,2,:) for variables located at a mass grid point or a u-grid point, and of dimension (IIU_ll,4,:) for variables located at a v-grid-point. Table 1 gives for all the possibilities considered in the model the location of the LB arrays in the global domain reference.
Fig 4b shows an example of decomposition in sub-domains : some sub-domains have empty intersection with the LB areas while others contain one or two pieces of the LB areas. Note that for domains with no-empty intersection with both the southern LB area and the northern area, the sizes along the x axis of the two intersections are the same, as we have rectangular sub-domains.
The distribution of the LB fields operates as following :
The routine GET_SIZEY_LB (GET_SIZEX_LB for west-east borders) provides the sizes of the LB arrays; given the location of the LB areas in the global domain reference (see Table 1), the location (IXORI:IXENDI,IYORI:IYENDI) of the intersections with the sub-domain in the local sub-domain reference is obtained by calling the interface routine GET_INTERSECTION_ll (Fig. 5). The size along the y-axis IJSIZEYF of the LB arrays is the size of the intersection of the sub-domain with the southern LB area, plus the size of the intersection of the sub-domain with the northern LB area. The sizes of the LB array can be zero if the two intersections are empty. Table 1 gives the sizes of the LB arrays for all the possibilities considered in the model.
New global variables NSIZELBX_ll, NSIZELBY_ll, NSIZELBXU_ll, NSIZELBYV_ll, NSIZELBXvar_ll and NSIZELBYvar_ll (var being R, TKE, EPS,SV) are initialized in INI_MODELn and stored in the module MODD_CPL; they are the total depths of the western and eastern LB areas and the total depths of the northern and southern LB areas for the total domain. Their values depend of the on/off relaxation switch (see Table 1). For the NSIZELBXvar_ll and NSIZELBYvar_ll, the on/off activation of the variable var is also taken into account.
The LB fields (or sources) can be initialized at three locations in the model : at the level of READ_FIELD for the initial time, at the level of INI_CPL for the first coupling file, and at the level of LS_COUPLING for the other coupling files. The routines SCALAR_LS_COUPLING and UVW_LS_COUPLING have been replaced by the routine LS_COUPLING in the parallel version. The routines READ_FIELD, INI_CPL and LS_COUPLING have also been reorganized in the parallel version in order to share the common code which has been grouped in the routines INI_LB and INI_LS.
In the routine INI_LB, the sizes of the LB areas in the FM-file and over the global domain are retrieved. Then, each LB fields are initialized by successive calls to the routine FMREAD_LB. Inside this routine, the LB fields are read in the FM-file and stored temporary in an array over the global domain (Step 1 in Fig. 6). Then, the distribution of the fields over each subdomain is performed : the local coordinates of the intersection of the LB areas with the subdomain are retrieved by successive calls to GET_INTERSECTION_ll (Steps 2ab in Fig. 6); then, the corresponding section in the temporary LB array over the global domain is determined; and finally, the I/O processor sends the corresponding section to each processor and initializes its own section (Step 3 in Fig. 6).
To parallelize the INIT block, different possibilities have been explored. The specific management of the maximum memory space during a parallel job have imposed the choice. This allocation is required once at the start of a job and no desallocation is possible during the life of the job. So INIT and the model temporal loop must use quite the same memory space.
The data domain decomposition is performed in INIT and all the routines work on sub-domain in a parallel way. The processings on global domain are exceptional (eigen value of the flat Laplacian operator in trid)
For the parallelization implementation, INIT can be divided in three parts (see Fig.7).
No change is needed, each process initializes the same variables.
This part gathers a large set of modifications required by the parallelization. Almost all the routines have been modified. A specific processing is required when one of these cases is encountered:
At the end of INI_MODELn, the update of the halos for the arrays listed in 2D-list and 3D-list are done. So all the arrays are correctly initialized over the extended sub-domain of each process at the end of INI_MODELn.
The global variables NIMAX_ll and NJMAX_ll which are the sizes of the global physical domain are read in FM-files. The local variables NIMAX and NJMAX are the sizes of the physical sub-domain. They are intialized by calling GET_DIM_PHYS_ll.
NB : the variables KINF,KISUP,KJINF,KJSUP have been removed in the parallel version of MESO-NH.
Note that the metric coefficients are now stored in a module, to avoid numerous exchanges of the halo of these coefficients in the temporal loop, needed after each call to METRICS.
The METRICS call has been shifted before the SET_DIRCOS call.
A set of specific routines allow the processing of sums and averages depending on the size of the destination variable (scalar, 1D, 2D), depending on the size of the source array (1D,2D,3D, global domain , sub-domain) and depending on the direction of this processing ( X or Y direction,Z direction). The REDUCESUM_ll routine is used to perform a global scalar sum, GET_SLICE_ll and SUM_1DFIELD_ll are used to perform global scalar average.
The initializations of the diagonal elements of a matrix in the different cases of lateral boundary conditions use global arrays ZEIGENX and ZEIGEN defined on the global spectral space. Each process initializes its sub-domain in the global matrix PBFY by using IORXY_ll, IORYY_ll to shift ZEIGEN in the sub-domain representation. This matrix is an y-slice distribution (see REMAP... routines in annexe). The indice bounds of this distribution are got from the GET_DIM_EXT_ll routine with the first argument equal to 'Y'.
Furthermore, this routine initializes the global domain arrays required for the FFT transform.
The processing is similar to the initializations of the LB: See above the LB arrays section for details.
The global variables IIMAX_ll and IJMAX_ll initialized by calling GET_GLOBALDIMS_ll, are used in the global sum computing ( REDUCE_SUM_ll routine performs this sum ).
Two options are available for the budgets :
The sources are always compressed along x and y directions, the compression along z direction can be choosen.
The results are stored in the XBURxx arrays of rank 3 or 4 : XBURxx (NBUKMAX, NBUWRNB, NBUMASK (,NBUPROCNBR)) where NBUKMAX is the vertical dimension, NBUMASK is the number of masks zones, NBUWRNB is the number of time steps stored between two writes, and NBUPROCNBR is the number of stored processes.
The first and the third dimensions have been permuted to optimize the transfer to the
The results are written with WRITE_DIACHRO every NBUWRNB computations.
The results are stored in the XBURxx arrays of rank 4 or 5 : XBURxx (NBUIMAX,NBUJMAX, NBUKMAX,NBUPROCNBR(,KSV)). The two first dimensions are linked to the horizontal grid and their values depend on the type of compression and NBUPROCNBR is for the number of stored processes.
The results are computed in the local domain of each processor. Just before writting, the local results are reduced on the global domain with the END_CART_COMPRESS and END_MASK_COMPRESS routines.
Figure 9 shows the flow diagram after parallelisation. The modifications are now described in detailed for each routine :
New dimensions and indices have been introduced. NBUIMAX_ll, NBUJMAX_ll are the sizes of the budget arrays for the whole geographical box (see Figure 10). NBUIMAX and NBUJMAX are the sizes of the budget arrays corresponding to the intersection of the geographical box with the local subdomain. The XBURxx arrays are allocated to these sizes.
NBUSIL,NBUSIH, NBUSJL,NBUSJH are the indices of the intersection of the geographical box with the physical subdomain (see Figure 10). NBUIL,NBUIH, NBUJL,NBUJH are still the indices of the geographical box in the global domain reference.
In case of void intersection of the geographical box with the local subdomain, the CBUTYPE variable is set to 'SKIP;, NBUIMAX and NBUJMAX are initialized to zero.
The two first dimensions of the arrays XBUSURF and LBU_MASK are linked to the horizontal grid, they are initialized with the GET_DIM_EXT_ll and the KIMAX,KJMAX arguments in INI_BUDGET call sequence are deleted.
XBUSURF and LBU_MASK are extended local subdomain arrays in order to use the interface routines which expect halo arrays.
Three local variables have been introduced IBUDIM1,IBUDIM2,IBUDIM3 to allow the different allocations of XBURxx arrays with specific dimensions in CART and MASK cases.
The storage counter array XBUSURF is only updated on the physical subdomain. So, it is always equal to zero in the halo area.
If there is no compress along x and y (.NOT.LBU_ICP .AND. .NOT.LBU_JCP), the data are merged in WRITE_DIACHRO; the ZWORK and ZWORKT arrays have for their two first dimensions the sizes of the intersection of the geographical box with the local subdomain : (NBUIMAX,NBUJMAX)
If there is a compress in at least one horizontal direction (LBU_ICP .OR.
LBU_JCP), the local sums
are reduced by calling END_CART_COMPRESS; the first two
dimensions of ZWORK and ZWORKT are (NBUIMAX_ll,NBUJMAX_ll).
If the compress is performed along x and y directions, the local sums are reduced by calling REDUCESUM_ll; the local sums for processors with void intersection (CBUTYPE='SKIP') being set to zero.
If the compress is performed only along x (resp y) direction, the local sums are first transfered in an array of the same size along y (resp x) than the subdomain one. Outside the intersection, the local sums are initialized to zero. Then, the local sums are reduced by calling SUM_DIM1_ll and SUM_DIM2_ll. Finally, only the section corresponding to the whole geographical box is kept.
The logical mask array XBUSURF is the same for all the budget results buffered NBUWRNB times. It is written once at the beginning of WRITE_BUDGET with the FMWRIT routine which merges the data on the global domain .Then, the optional argument PMASK is not used in all the next WRITE_DIACHRO calls.
To be well considered in later READ_DIACHRO routine, the MENU_DIACHRO routine must be called after the FMWRIT one.
The budget results are named 'YGROUP_nnnn' where YGROUP is the name of the variable and nnnn is the sequence number of the write in the diachro file. The mask array is named 'MASK_nnnn.MASK'.
The local budget results are reduced by calling END_MASK_COMPRESS and are written by calling WRITE_DIACHRO without the PMASK optional argument.
Local ZWORK array is added to store the reduce sums of XBURHODJU, XBURHODJV, XBURHODJW,
XBURHODJ which are used several times.
This new function performs the global sum with the local sums by calling REDUCESUM_ll. To preserve the subdomain budget results, the local sum is transferred in a local arrays ZVAR2D (if the compression along z is performed) or ZVAR3D.
In standard configuration (optional PMASK argument not used but MASK case), the mask array is written outside the routine but the dimensions of this mask array are still initialized.
If the optional PMASK argument is used, the mask array is written in this routine and the FMWRIT merged the data on the global domain.