The parallelization of the code has been achieved in modifying
the temporal loop to take into account the data domain
decomposition. Fig. 3 shows which routines of code have been
modified.

Three types of routines can be distinguished: routines to communicate fields between processors, routines allowing the treatment of physical boundary conditions and routines doing global operations. Others communications are required for distributed I/O and pressure solver and are described in other sections.

These exchanges are necessary to update the halo of all subdomains (routine UPDATE_HALO_ll) and to also update the second layer of the halo (routine UPDATE_HALO2). Communications concerning the second layer of the halo are connected to the 4th order numerical diffusion.

The following routines have been modified to include resfreshemnt of the halo:

- EXCHANGE :all the prognostic variables at the end of the temporal loop(cyclic conditions are taken into account during this exchange)
- ADVECTION : contravariant components of the wind used to express the advection operator (this exchange should be removed in case of flat simulation)
- MPDATA : for each iterations greater than 1, the guess of the future time of scalar fields
- DFLUX_CORR : the "outgoing limiting factor" and also for cyclic case the boundary conditions on advective fluxes are made
- PHYS_PARAM$n : surface fluxes as input of turbulent scheme (this exchange should be removed in case of flat simulation)
- TURB : wind components parallel to the orography (2D fields)

Modifications in some Meso-NH routines have been brought to allow to identify the location of physical boundaries. The ComLib routines LNORTH_ll, LWEST_ll, LSOUTH_ll and LEAST_ll are called in Meso-NH routines ANTI_DIFF, BOUNDARIES, DFLUX_CORR, MPDATA, NUM_DIFF, PHYS_PARAM$n, RAD_BOUND, and TURB. In addition the routine GET_INTERSECTION_ll is called in RELAXATION to allow to know the relative location of the relaxation region.

Specific global functions (MIN_ll and SUM3D_ll) have been thus
implemented in

RESOLVED_CLOUD routine. Other global operations are made in
the Meso-NH model in the
statistical analysis of Meso-NH (BUDGET
and LES_SPECTRA parts) and will be further modified.

The LB fields for relaxation along the y-axis are constitued of two
slices along the South and North borders (Fig.4a). In the
non-parallellized version of the model, the two slices are stored in a
single array of dimension (IIU_ll,2*NRIMY +2,:); the southern slice is
stored in the section (1:IIU_ll,1:NRIMY+1,:) and the northern slice, in
the section

(1:IIU_ll,NRIMY+2:2*NRIMY +2,:). NRIMY is the depth of the
riming zone. In the FM-files, the two slices of the LB fields can be
larger than that is needed in the model. The LB arrays are of dimension
(1:IIU;2*ILBSIZEY+2,:) in the FM-file (see Fig.4a).

For the LB fields of variables located at a mass grid point or a u-grid
point, the two slices correspond to the areas (1:IIU_ll,1:1+NRIMY)
and (1:IIU_ll, IJU_ll - NRIMY: IJU_ll ) on the global domain
reference (Fig.4b) . For the LB fields of variables located
at a v-grid point, the two slices correspond to the areas
(1:IIU_ll,2:2+NRIMY) and

(1:IIU_ll, IJU_ll - NRIMY:IJU_ll ) on the
global domain. If there is no relaxation, the LB fields are stored in a
single array of dimension (IIU_ll,2,:) for variables located at a mass
grid point or a u-grid point, and of dimension (IIU_ll,4,:) for
variables located at a v-grid-point. Table 1 gives for all
the possibilities considered in the model the location of the LB arrays
in the global domain reference.

Fig 4b shows an example of decomposition in sub-domains : some sub-domains have empty intersection with the LB areas while others contain one or two pieces of the LB areas. Note that for domains with no-empty intersection with both the southern LB area and the northern area, the sizes along the x axis of the two intersections are the same, as we have rectangular sub-domains.

The distribution of the LB fields operates as following :

- Allocation the LB arrays (in INI_MODELn) :
**Figure 5:**Determination of the sizes of the South-North LB arrays on each sub-domain for variables at mass points and with relaxationThe routine GET_SIZEY_LB (GET_SIZEX_LB for west-east borders) provides the sizes of the LB arrays; given the location of the LB areas in the global domain reference (see Table 1), the location (IXORI:IXENDI,IYORI:IYENDI) of the intersections with the sub-domain in the local sub-domain reference is obtained by calling the interface routine GET_INTERSECTION_ll (Fig. 5). The size along the y-axis IJSIZEYF of the LB arrays is the size of the intersection of the sub-domain with the southern LB area, plus the size of the intersection of the sub-domain with the northern LB area. The sizes of the LB array can be zero if the two intersections are empty. Table 1 gives the sizes of the LB arrays for all the possibilities considered in the model.

- Initialisation of the LB arrays :
New global variables NSIZELBX_ll, NSIZELBY_ll, NSIZELBXU_ll, NSIZELBYV_ll, NSIZELBX

*var*_ll and NSIZELBY*var*_ll (*var*being R, TKE, EPS,SV) are initialized in INI_MODELn and stored in the module MODD_CPL; they are the total depths of the western and eastern LB areas and the total depths of the northern and southern LB areas for the total domain. Their values depend of the on/off relaxation switch (see Table 1). For the NSIZELBX*var*_ll and NSIZELBY*var*_ll, the on/off activation of the variable*var*is also taken into account.The LB fields (or sources) can be initialized at three locations in the model : at the level of READ_FIELD for the initial time, at the level of INI_CPL for the first coupling file, and at the level of LS_COUPLING for the other coupling files. The routines SCALAR_LS_COUPLING and UVW_LS_COUPLING have been replaced by the routine LS_COUPLING in the parallel version. The routines READ_FIELD, INI_CPL and LS_COUPLING have also been reorganized in the parallel version in order to share the common code which has been grouped in the routines INI_LB and INI_LS.

In the routine INI_LB, the sizes of the LB areas in the FM-file and over the global domain are retrieved. Then, each LB fields are initialized by successive calls to the routine FMREAD_LB. Inside this routine, the LB fields are read in the FM-file and stored temporary in an array over the global domain (Step 1 in Fig. 6). Then, the distribution of the fields over each subdomain is performed : the local coordinates of the intersection of the LB areas with the subdomain are retrieved by successive calls to GET_INTERSECTION_ll (Steps 2ab in Fig. 6); then, the corresponding section in the temporary LB array over the global domain is determined; and finally, the I/O processor sends the corresponding section to each processor and initializes its own section (Step 3 in Fig. 6).

To parallelize the INIT block, different possibilities have been explored. The specific management of the maximum memory space during a parallel job have imposed the choice. This allocation is required once at the start of a job and no desallocation is possible during the life of the job. So INIT and the model temporal loop must use quite the same memory space.

The data domain decomposition is performed in INIT and all the routines work on sub-domain in a parallel way. The processings on global domain are exceptional (eigen value of the flat Laplacian operator in trid)

For the parallelization implementation, INIT can be divided in three parts (see Fig.7).

- Part 1: initialization of physical constants ( 0D variables).
No change is needed, each process initializes the same variables.

- Part 2: namelist readings and controls (I/O and 0D variables). The
specific routines which manage the I/O dataflow allow a parallel
execution of this part.
- Part 3: allocation of the arrays and
initialization of the differents variables
used in the model temporal loop ( I/O, nD variables). See Fig 8.
This part gathers a large set of modifications required by the parallelization. Almost all the routines have been modified. A specific processing is required when one of these cases is encountered:

- sum, average and extrema computings
- use of arrays which don't covered the all global domain ( LB and
relaxation area).
- simultaneous use of differents domains ( sub-domain and global
domain).
- the I/O routines: OPEN,CLOSE,FMLOOK have been replaced by
OPEN_ll, CLOSE_ll, FMLOOK_ll. The I/O routine FMATTR has been
suppressed.
- the arguments of the I/O routine FMREAD have been changed.
- the specific routines GET_DIM_EXT_ll, GET_DIM_PHYS_ll, GET_DIM_INDICE_ll are called to initialize respectively the size of the extended sub-domain, the size of the physical sub-domain, the bounds of the physical sub-domain.

- sum, average and extrema computings

At the end of INI_MODELn, the update of the halos for the arrays listed
in 2D-list and 3D-list are done. ** So all the arrays are correctly
initialized over the extended sub-domain of each process at the end of
INI_MODELn.**

- INI_MODELn : Two sets of dimensions are used : the dimensions
(IIU_ll,IJU_ll) of the extended domain, and the dimensions (IIU,IJU)
of the extended sub-domain. The arrays are allocated at the dimensions
of the extended sub-domain, except for some arrays as XTRIGSX, XTRIGSY
which are defined for the extended global domain in the spectral space.
The LB arrays are
discussed in a specific subsection above.
- SET_DIM : The initialization of the interface package is performed
in this routine. It must be done before any call to an interface
routine.
The global variables NIMAX_ll and NJMAX_ll which are the sizes of the global physical domain are read in FM-files. The local variables NIMAX and NJMAX are the sizes of the physical sub-domain. They are intialized by calling GET_DIM_PHYS_ll.

NB : the variables KINF,KISUP,KJINF,KJSUP have been removed in the parallel version of MESO-NH.

- INI_PARA_ll : see the user guide of interface routines.

- GET_SIZEX_LB and GET_SIZEY_LB :
see LB arrays section for a description of these routines.
- INI_BUDGET :
see below the Budget section for a description of this routine.
- SET_GRID : PLONOR_ll and PLATOR_ll are the longitude and
latitude of the southwestern mass point of the global domain. They are
read in FM-files. PLATOR, PLONOR are the longitude and latitude of the
southwestern mass point of the sub-domain. They are computed by
calling the routine SM_LATLON with the corresponding position x and y
of the southwestern mass point of the sub-domain. As SM_LATLON needs
the arrays of position x and y on the global domain, a gathering
operation is performed (1 communication of data).
- SM_GRIDCART : The halos of the arrays XDXHAT and XDYHAT are not
correctly initialized due to the use of external points of the
sub-domain in their computation; as these arrays are used to compute
the jacobian XJ, they are updated inside the routine just before the
XJ initialization (1 communication of data).
- SM_GRIDPROJ : The halos of the arrays ZXHATM,ZYHATM, XDXHAT and XDYHAT are not correctly initialized due to the use of external points of the sub-domain in their computation; as these arrays are used to compute other variables(XJ,XLAT,XLON), they are updated inside the routine just after their initializations (2 communications of data).

- SM_GRIDCART : The halos of the arrays XDXHAT and XDYHAT are not
correctly initialized due to the use of external points of the
sub-domain in their computation; as these arrays are used to compute
the jacobian XJ, they are updated inside the routine just before the
XJ initialization (1 communication of data).
- METRICS : There is no change in this routine; but the halos of
the metrics XDXX, XDZX, XDYY, XDZY, XDZZ are not correctly initialized
due to the use of external points of the sub-domain in their
computation. As they are used to compute other variables, they are
updated just after the call to METRICS (1 communication of data).
Note that the metric coefficients are now stored in a module, to avoid numerous exchanges of the halo of these coefficients in the temporal loop, needed after each call to METRICS.

The METRICS call has been shifted before the SET_DIRCOS call.

- UPDATE_HALO_ll :
see the user guide of interface routines.
- SET_DIRCOS : The initialization of some arrays (XDIRCOSZW,
XDIRCOSXW,

XDIRCOSYW, XCOSSLOPE, XSINSLOPE) is not correct on the external points of the sub-domain. They are added to the list of arrays to update at the end of INI_MODELn. Specific treatment is performed for the sub-domains located on the borders of the domain in case of lateral cyclic conditions. - READ_FIELD : Two new routines (INI_LB, INI_LS) have been added to initialize the LS and LB fields (See above the LB arrays section for a description of INI_LB).
- READ_GR_FIELD : All blocks to
read the variables have been modified.
- SET_REF : Three global sums from local values computed on each
sub-domain are performed to initialize the 0D variables XREFMASS,
XMASS_O_PHI0 and XLINMASS (3 communications of data). The
initialization of the lineic mass XLINMASS is done only for
sub-domains on the borders of the global domain.
- INI_TKE_EPS : The use of the Shuman operators on the PTKEM and
PTKET arrays implies to add them to the 3D-list of arrays that have to
be communicated at the end of INI_MODELn.
- INI_CPL : see LB arrays section for a description of this
routine.
- INI_DYNAMICS : The argument PBFY (elements of the tri-diag
matrix on an y-slice of global physical domain) has replaced the PBF
argument in order to agree with the calling arguments of the TRID
routine. The PDXHAT and PDYHAT arrays have replaced the PXHAT and PYHAT
arrays to avoid communication inside this routine.
- TRID: This routine uses arrays allocated on the sub-domain and
arrays allocated on the global spectral domain (as PMAP, PTRIGSX, PTRIGSY ,PBFY).
It is necessary to know the size, the beginning and the end of each
domain (global and sub-domain), and the location of the sub-domain in
the global domain: IIMAX_ll and IJMAX_ll (given by the
GET_GLOBALDIMS_ll routine) and IIB_ll, IIE_ll, IJB_ll, IJE_ll
define the global domain. IIMAX, IJMAX (given by the GET_DIM_PHYS_ll
routine) and IIB, IIE, IJB, IJE ,IJE (given by the GET_INDICE_ll
routine) ,and IORXY_ll, IORYY_ll (given by the GET_OR_ll routine)
define the sub-domain and its relative location.
A set of specific routines allow the processing of sums and averages depending on the size of the destination variable (scalar, 1D, 2D), depending on the size of the source array (1D,2D,3D, global domain , sub-domain) and depending on the direction of this processing ( X or Y direction,Z direction). The REDUCESUM_ll routine is used to perform a global scalar sum, GET_SLICE_ll and SUM_1DFIELD_ll are used to perform global scalar average.

The initializations of the diagonal elements of a matrix in the different cases of lateral boundary conditions use global arrays ZEIGENX and ZEIGEN defined on the global spectral space. Each process initializes its sub-domain in the global matrix PBFY by using IORXY_ll, IORYY_ll to shift ZEIGEN in the sub-domain representation. This matrix is an y-slice distribution (see REMAP... routines in annexe). The indice bounds of this distribution are got from the GET_DIM_EXT_ll routine with the first argument equal to 'Y'.

Furthermore, this routine initializes the global domain arrays required for the FFT transform.

- RELAXDEF: This routine determines the sub-domain LB area which
is the intersection
between the global domain LB area and the physical sub-domain : the routine
GET_INTERSECTION_ll gives the sizes of the intersection if it is not empty.
The processing is similar to the initializations of the LB: See above the LB arrays section for details.

- TRID: This routine uses arrays allocated on the sub-domain and
arrays allocated on the global spectral domain (as PMAP, PTRIGSX, PTRIGSY ,PBFY).
It is necessary to know the size, the beginning and the end of each
domain (global and sub-domain), and the location of the sub-domain in
the global domain: IIMAX_ll and IJMAX_ll (given by the
GET_GLOBALDIMS_ll routine) and IIB_ll, IIE_ll, IJB_ll, IJE_ll
define the global domain. IIMAX, IJMAX (given by the GET_DIM_PHYS_ll
routine) and IIB, IIE, IJB, IJE ,IJE (given by the GET_INDICE_ll
routine) ,and IORXY_ll, IORYY_ll (given by the GET_OR_ll routine)
define the sub-domain and its relative location.
- INI_CLOUD :
The computing of the minimal value PDZMIN on
the domain which involves communications (MIN_ll),
is now performed once outside
the two routines INI_CLOUD and INI_RAIN_ICE.
- INI_RADIATIONS: The use of the Shuman operators on the PSLOPANG
and PSLOPAZI arrays involves to add them to the 2D-list of arrays that
have to be communicated at the end of INI_MODELn.
The global variables IIMAX_ll and IJMAX_ll initialized by calling GET_GLOBALDIMS_ll, are used in the global sum computing ( REDUCE_SUM_ll routine performs this sum ).

Two options are available for the budgets :

- The MASK option : The sources of the budget are computed on several
horizontal zones defined by logical mask arrays. These mask arrays are initalized
at the beginning of the temporal loop in SET_MASK routine and could change during
the run.
The sources are always compressed along x and y directions, the compression along z direction can be choosen.

The results are stored in the XBURxx arrays of rank 3 or 4 : XBURxx (NBUKMAX, NBUWRNB, NBUMASK (,NBUPROCNBR)) where NBUKMAX is the vertical dimension, NBUMASK is the number of masks zones, NBUWRNB is the number of time steps stored between two writes, and NBUPROCNBR is the number of stored processes.

The first and the third dimensions have been permuted to optimize the transfer to the

WRITE_DIACHRO routine.

The results are written with WRITE_DIACHRO every NBUWRNB computations.

- The CART option : The sources of the budgets are computed on a
geographical box which is constant during the run. The sources can
be compressed in one, two or three directions along any of the x,y or z
direction. So, that the result may be 3D, 2D, 1D or 0D arrays.
The results are stored in the XBURxx arrays of rank 4 or 5 : XBURxx (NBUIMAX,NBUJMAX, NBUKMAX,NBUPROCNBR(,KSV)). The two first dimensions are linked to the horizontal grid and their values depend on the type of compression and NBUPROCNBR is for the number of stored processes.

The results are computed in the local domain of each processor. Just before writting, the local results are reduced on the global domain with the END_CART_COMPRESS and END_MASK_COMPRESS routines.

Figure 9 shows the flow diagram after parallelisation. The modifications are now described in detailed for each routine :

- INI_MODELn : The list of arguments in INI_BUDGET call sequence has been modified.
- INI_BUDGET :
- The CART option :
New dimensions and indices have been introduced. NBUIMAX_ll, NBUJMAX_ll are the sizes of the budget arrays for the whole geographical box (see Figure 10). NBUIMAX and NBUJMAX are the sizes of the budget arrays corresponding to the intersection of the geographical box with the local subdomain. The XBURxx arrays are allocated to these sizes.

NBUSIL,NBUSIH, NBUSJL,NBUSJH are the indices of the intersection of the geographical box with the physical subdomain (see Figure 10). NBUIL,NBUIH, NBUJL,NBUJH are still the indices of the geographical box in the global domain reference.

In case of void intersection of the geographical box with the local subdomain, the CBUTYPE variable is set to 'SKIP;, NBUIMAX and NBUJMAX are initialized to zero.

- The MASK option :
The two first dimensions of the arrays XBUSURF and LBU_MASK are linked to the horizontal grid, they are initialized with the GET_DIM_EXT_ll and the KIMAX,KJMAX arguments in INI_BUDGET call sequence are deleted.

XBUSURF and LBU_MASK are extended local subdomain arrays in order to use the interface routines which expect halo arrays.

Three local variables have been introduced IBUDIM1,IBUDIM2,IBUDIM3 to allow the different allocations of XBURxx arrays with specific dimensions in CART and MASK cases.

- The CART option :
- MODELn : The initialization of the variable LBU_ENABLE is modified so that the BUDGET routine is not called for subdomain with a void intersection with the geographical box (case CBUTYPE ='SKIP').
- SET_MASK :The arrays are defined on the extended local subdomain. The bounds of this
domain are set with the GET_INDICE_ll routine.
LBU_MASK is initialized on the physical subdomain and is set to FALSE in the halo area.
The storage counter array XBUSURF is only updated on the physical subdomain. So, it is always equal to zero in the halo area.

- BUDGET : no change in this routine.
- BUDGET_CASE : The dimensions of the inout array argument are implicit.
- CART_COMPRESS : The indices of the storage array have been modified in order to consider only the intersection of the geographical box with the local subdomain. So that, only the local sums are performed in this routine.
- MASK_COMPRESS : The bounds of the physical local subdomain are initialized with the GET_INDICE_ll routine. The first and the third dimensions of the arrays have been permuted: first one is the dimension along z and the third one is the number of mask zones.

- BUDGET_CASE : The dimensions of the inout array argument are implicit.
- ENDSTEP_BUDGET : The routine WRITE_BUDGET is called by all
the processors (even if CBUTYPE is equal to 'SKIP') . The call sequence of WRITE_BUDGET
has been modified.
- WRITE_BUDGET :
The dimensions of the local subdomain are no longer used, so the interface has been
modified.
- The CART option :
If there is no compress along x and y (.NOT.LBU_ICP .AND. .NOT.LBU_JCP), the data are merged in WRITE_DIACHRO; the ZWORK and ZWORKT arrays have for their two first dimensions the sizes of the intersection of the geographical box with the local subdomain : (NBUIMAX,NBUJMAX)

If there is a compress in at least one horizontal direction (LBU_ICP .OR. LBU_JCP), the local sums are reduced by calling END_CART_COMPRESS; the first two dimensions of ZWORK and ZWORKT are (NBUIMAX_ll,NBUJMAX_ll).

END_CART_COMPRESS :If the compress is performed along x and y directions, the local sums are reduced by calling REDUCESUM_ll; the local sums for processors with void intersection (CBUTYPE='SKIP') being set to zero.

If the compress is performed only along x (resp y) direction, the local sums are first transfered in an array of the same size along y (resp x) than the subdomain one. Outside the intersection, the local sums are initialized to zero. Then, the local sums are reduced by calling SUM_DIM1_ll and SUM_DIM2_ll. Finally, only the section corresponding to the whole geographical box is kept.

- The MASK option :
The logical mask array XBUSURF is the same for all the budget results buffered NBUWRNB times. It is written once at the beginning of WRITE_BUDGET with the FMWRIT routine which merges the data on the global domain .Then, the optional argument PMASK is not used in all the next WRITE_DIACHRO calls.

To be well considered in later READ_DIACHRO routine, the MENU_DIACHRO routine must be called after the FMWRIT one.

The budget results are named 'YGROUP_nnnn' where YGROUP is the name of the variable and nnnn is the sequence number of the write in the diachro file. The mask array is named 'MASK_nnnn.MASK'.

The local budget results are reduced by calling END_MASK_COMPRESS and are written by calling WRITE_DIACHRO without the PMASK optional argument.

Local ZWORK array is added to store the reduce sums of XBURHODJU, XBURHODJV, XBURHODJW, XBURHODJ which are used several times.

END_MASK_COMPRESS :This new function performs the global sum with the local sums by calling REDUCESUM_ll. To preserve the subdomain budget results, the local sum is transferred in a local arrays ZVAR2D (if the compression along z is performed) or ZVAR3D.

- The CART option :
- WRITE_DIACHRO :
Two options in MASK case are considered.
In standard configuration (optional PMASK argument not used but MASK case), the mask array is written outside the routine but the dimensions of this mask array are still initialized.

If the optional PMASK argument is used, the mask array is written in this routine and the FMWRIT merged the data on the global domain.

- WRITE_BUDGET :
The dimensions of the local subdomain are no longer used, so the interface has been
modified.