Encyclopaedia Index

The
Parallel PHOENICS 3.1
Companion
CHAM/TR316/PAR
Version 3.0


by N D Baltas


Apr 1997

TABLE OF CONTENTS

  1. Introduction

  2. What is Parallel PHOENICS

  3. Parallelisation Strategy

  4. How to run Parallel PHOENICS: a step-by-step guide

  5. New functionalities

  6. Running in sequential mode


1. Introduction

Computer simulations of real problems in the area of CFD are becoming increasingly more important for industry.

For CFD predictions to be realistic, large amounts of computer power and memory often required.

Parallel computers have become available at more-affordable prices than the very expensive vector super-computers.

Parallel PHOENICS now has been ported to most MIMD parallel computers with very impressive performance.

The figure below shows execution times, for a typical industrial problem, when parallel computers are used, and how they compares with the time required on a high-end workstation (HP735/99 MHz).

The two times shown for the HP735 runs, represent the execution using the Standard PHOENICS Linear Equation Solver and the Iterative (Conjugate Residual) solver.

This problem is the simulation of the turbulent flow around a ship- hull.

The grid size is 30x30x155, and PHOENICS solved for pressure (P1), the velocities (U1,V1,W1) and the turbulence variables KE and EP.

In this document we describe what is Parallel PHOENICS, how to set-up a problem and how to run it on a parallel machine.

It is presumed that the reader is familiar with PHOENICS as it runs on a sequential machine.

2. What is Parallel PHOENICS?

Parallel PHOENICS retains the sequential pre-processing and post- processing modules (SATELLITE and PHOTON) but runs EARTH in parallel, on many processors, using a domain decomposition or grid-partitioning technique. You can also run EARTH on one processor which is equivalent to the sequential version of EARTH. In brief the following steps must be followed to perform a parallel run (more detailed instructions, are given in section 4 of this document):

Take an existing q1 file that runs on a sequential computer and define as many sub-domains as your hardware permits (note that the number of sub-domains must be equal to the number of processors available for the run). The sub-domain definition can be done either by explicitly defining regions using the PIL command PATCH, or by loading the user library u100 and specifying by a single command the number of sub-domains.

Run SATELLITE using the modified q1 file to produce an eardat for the whole domain; this contains additional information about the area each sub-domain occupies.

Set the environment variable PHOENPROCS to the number of processors you are going to use. Furthermore check that the number of processors to be used matches the number of sub-domains defined in the q1 file.

Start PVM as usual (if in doubt refer to TR/110/PAR or section 4(c) of this manual). This section is skipped for MPI implementations.

Run EARTH by typing runear.

At the end of the run a single phi file is created which contains the field data for the whole domain.

3. Parallelisation Strategy

The EARTH module uses most of the computing time and therefore it is this part of PHOENICS which is ported to the parallel computer.

The most suitable strategy for parallelising EARTH is grid partitioning, where the computational grid is divided into subgrids.

The computational work related to each subgrid is then assigned to a processor.

A modified version of EARTH is replicated over all available processors and runs in parallel, exchanging boundary data at the appropriate times.

A single processor controls input/output, acting as a server to the other processors; this is usually the parent processor that starts execution and spawns earexe to the other processors.

The main input operation is to read the data-file (eardat) produced by the pre-processor which contains information about the simulation settings as well as the description of the subgrids; the server processor, after reading the file, passes the same file to the other processors.

In parallel, each processor extracts information specific to its subgrid and continues with the solution procedure. At the end of the solution run, each processor produces field data which are assembled by the server processor, but reconstructed as a single set of data for the entire grid and written on the disk as one file, exactly as in a sequential run. Only one result file is produced by the server processor.

Grid partitioning in Parallel-PHOENICS can be done in one or two dimensions (z and y direction). For the present installation only 1-D grid partitioning is supported in the z-direction.

The parallel version uses exactly the same algorithms as in the sequential version with the following exceptions: in the sequential version the integration of the velocities is performed on the current (x,y)-slab using the updated values from the previous slab, while in the parallel algorithm the required values (of u,v, and w) are used from the previous iteration.

The structure of the parallel algorithm is given below:

For the I/O server node READ and DISTRIBUTE geometry specifications and other parameters For all other nodes RECEIVE geometry information and other parameters Do for NSTEPS time-steps Do until convergence or maximum number of iterations reached
Do for all(x,y)-slabs on this node in sequence
Calculate sources, coefficients and solve for new V (2D-Solver)
Calculate sources, coefficients and solve for new U (2D-Solver)
Calculate sources, coefficients and solve for new W (2D-Solver)
Exchange values for U, V, W at subgrid boundaries between processors
Compute global residuals for U, V, and W
Assemble pressure correction equation and solve (3D-Solver)
Compute global residual for pressure
Assemble equations and solve for other variables:ke,ep,h(3D-Solver) For the I/O server node RECEIVE and PRINT results from this and other nodes For all other nodes SEND results to the server End

Another feature which differentiates sequential from Parallel-PHOENICS is the Linear Equation Solver (LES). The standard PHOENICS LES, with block-correction acceleration, is very efficient for sequential computers but its recursive structure poses difficulty for implementation on parallel machines.

The need for an easily parallelisable LES led to the development of a new solver which is almost trivially parallel but robust and with good convergence properties.

The new LES is based on Conjugate Gradient acceleration techniques and a simple diagonal preconditioner. Within the LES, boundary values are exchanged at each iteration and a number of global reductions take place to evaluate the required scalar products. Details of the parallel LES and of the methods used to port PHOENICS to MIMD computers can be found in Reference [1].

4. How to run Parallel-PHOENICS: a step-by-step guide

This section describes in detail how to set up a PHOENICS problem and run it on a parallel machine. If you followed the installation instructions (TR/110/PAR) you will have Parallel PHOENICS installed and ready to run.

Familiarity with a parallel system is necessary when writing your own code; otherwise no prior knowledge is required. You must have installed MPI or PVM (depending on the Parallel PHOENICS installation you have received) and know how to run them. Refer to TR/110/PAR for detailed instructions and also contact your system administrator for help.

(a) Where to begin

If the Parallel-PHOENICS system is installed in precise accordance with the instructions supplied by CHAM, the directory where your data files reside is called private (d_priv1).

You will find the d_priv1 sub-directory under the main directory PHOENICS.

(b) Defining the sub-domains

(i) Automatic, using library U100 Take a standard q1 file and add the following three lines at the end (before the STOP command):
NOWIPE = T
LOAD(U100)
UWATCH=F
TSTSWP = 1 Note that the user-library ULIBDA should reside at your working directory.

As an example, a number of test q1-files have been included in the d_priv1 directory, with the names test-[1-3][a-b].q1. For illustration purposes we will use here the file test-1a.q1.

This is the 'flow over a backward-facing step, modified according to the above instructions. Please list the file and examine its contents. You will also notice that after STOP there is the additional ENDPROP flag; this is used to eliminate certain system bugs that we encountered on some parallel systems while reading the file, and has no effect on the simulation.

However it should be included on every q1 that is used for the parallel runs. The same ENDPROP flag is included on the properties file props in /PHOENICS/d_earth directory.

Some explanation on the above statements is now given:

The statement NOWIPE=T is preventing SATELLITE from deleting the contents of q1 before loading U100.

For performance purposes the graphical interface has not been implemented and the information regarding the solution must be displayed in text mode (TSTSWP=1). You can set UWATCH=T (the default) to display the monitor values at each sweep, or set UWATCH=T to print the residual values only.

To demonstrate how the sub-domains can be defined, run SATELLITE using test-1a.q1. First copy test-1a.q1 to q1 by typing:

Now run SATELLITE by typing runsat and pressing . The following will be displayed on your screen:

The ID of this node is...52811045

This code is valid for a limited period For the exact date of expiry see the line below the CHAM logo which contains the text "The code expiry date is the end of : ".

To continue to use PHOENICS after that date, send a completed Unlocking Request Form (including the node ID shown above) to CHAM.


   ---------------------------------------------------------
  CCCC HHH        PHOENICS Version 3.1 - SATELLITE
      CCCCCCCC     H     (C) Copyright 1997
    CCCCCCC    See   H   Concentration Heat and Momentum Ltd
   CCCCCCC   the new  H  All rights reserved.
   CCCCCC     PLANT   H  Address:  Bakery House, 40 High St
   CCCCCCC   feature  H  Wimbledon, London, SW19 5AU
    CCCCCCC  PLNTDEM H   Tel:       0181-947-7651
      CCCCCCCC     H     Facsimile: 0181-879-3497
  CCCC HHH        E-mail:  phoenics@cham.co.uk
   ---------------------------------------------------------
   This program forms part of the PHOENICS installation for:
     CHAM
   The code expiry date is the end of : Sep 2007
   ---------------------------------------------------------
TITLE = Flow over back-facing step (K-E).

Two (2) sub-domains have been defined in z-direction You can make new settings now or press RETURN to continue Number of sub-domains ?

The program will define, as a default, two sub-domains and is asking the user whether she/he wants to define more than two (2) sub-domains for this case. There is a limit on how many sub-domains can be defined, depending on the grid size. Each sub-domain must have at least 3 cells (or slabs for the z-direction), otherwise a WARNING is printed.

If you press (RETURN) the default value of 2 is used, and therefore two sub-domains are defined. For this example let us define six sub-domains, by typing 6 and pressing , i.e.:

6

The following message is printed which gives information on the six sub-domains defined.


Sub-domain number 1
IZF=1
IZL=8
Number of slabs in this sub-domain=8


Sub-domain number 2
IZF=9
IZL=16
Number of slabs in this sub-domain=8


Sub-domain number 3
IZF=17
IZL=23
Number of slabs in this sub-domain=7
Sub-domain number 4
IZF=24
IZL=30
Number of slabs in this sub-domain=7


Sub-domain number 5
IZF=31
IZL=37
Number of slabs in this sub-domain=7


Sub-domain number 6
IZF=38
IZL=45
Number of slabs in this sub-domain=8


For CONPOR patch STEP , zero VPOR is replaced by PRPS setting with value = 1.990E+02 EARDAT file written for RUN 1, Library Case=100. NORMAL STOP REACHED IN PROGRAM

The first line states which sub-domain this is; IZF and IZL represent the extent of the sub-domain in the z-direction; and the number of slabs (z-wise cells) corresponding to this sub-domain.

Information on the other sub-domains is given in a similar manner. Note that since NZ=20 (NZ represent the Global size of the grid in the z direction), it is not possible to divide the grid equally and sub-domains 1 and 6 have one extra slab (4) compared to the other sub-domains, which have 3 slabs.

Note that if you have set TALK=T, at the top of your Q1 file, you will be prompted by the following messages:

   ************* type M to access TOP-MENU *************

Next instruction, please; else M for menu, or END to end

Unless you wish to enter any other commands, type END:

end

************************************************************ EARDAT file written for RUN 1, Library Case=614. ************************************************************

Replace Q1 file by instruction stack? (Y/N) n

All instructions completed. Thank you. NORMAL STOP REACHED IN PROGRAM

When loading U100, SATELLITE adds PATCHes with the special names S-010101, S-010102, S-010103, S-010104, S-010105, S-010106 (if 6 domains are requested). These patches represent the area occupied by each sub-domain, where each processor is going to perform its computation. The name of the sub-domain patches must always begin with S- or s- , followed by six digits.

The six digits indicate the position of each sub-domain relative to the cartesian axis system (ie S-xxyyzz), and this convention accommodates either 1-D, 2-D or 3-D domain decomposition (although only 1-D decomposition in the z-direction, is supported here). The figure below explains clearly this convention for the 6 sub- domains defined above.

By inspecting the eardat file you will see that 6 extra PATCHes have been defined with the names S-010101, S-010102, S-010103, etc.(ii) Manual, explicit definition

The way U100 divides the whole domain into sub-domains in the z-direction, may not be convenient for a particular problem, and many times the user wishes to define the sub-domains explicitly in the q1 file. This can be done using the PATCH command but care is needed so that:

There is no overlapping between different sub-domains The whole domain is covered by all sub-domains. If the above rules are not satisfied the program will stop and an error message will be printed.

The following four PATCHes define incorrectly four sub-domains, since sub-domain S-010102 starts from IZ=6, and therefore overlaps with S-010101.
PATCH(S-010102,VOLUME,1,NX,1,NY,6,12,1,1)
PATCH(S-010103,VOLUME,1,NX,1,NY,13,18,1,1)
PATCH(S-010104,VOLUME,1,NX,1,NY,19,24,1,1)

The correct definition of S-010102 should be: PATCH(S-010102,VOLUME,1,NX,1,NY,7,12,1,1)

Similarly, the arrangement, shown below, is incorrect, because not the whole domain area has been covered by the sub-domains (shaded region in the diagram below).

When you define your sub-domains it is highly desirable to divide your domain equally, so that the same number of cells correspond on each sub-domain and consequently on each processor. This is important in order to achieve good load-balancing. Good load-balancing determines how efficiently a parallel code runs; this is achieved by keeping all processors busy during execution, ie by distributing equal tasks on each processor.

(c) Running PVM

NOTE: Those who have an MPI implementation of Parallel-PHOENICS should skip this section.

Before you run your application on a parallel machine, you must first start PVM by executing pvm. Follow the instructions below, to run PVM.

Make sure that PVM has been installed properly (see TR/110/SGI). Create a hostfile in your d_priv1 directory, which describes your Parallel machine (see sample file in ../PHOENICS/bin directory); make sure the paths to the executable are correct.

Start PVM by typing pvm hostfile.

The PVM console prints the prompt pvm> and it is ready to accept commands from the standard input.

Type conf to list the configuration of the virtual machine showing hostname, pvmd task ID, etc. This will check whether the PVM deamon, pvmd, runs on each node of the virtual machine we defined in hostfile. If the conf command does not list any nodes you are not running PVM correctly and you should contact your system administrator or any other person with PVM experience.

If your virtual-parallel-machine has been configured properly, type quit to exit from the console, leaving deamons and PVM jobs running.

(d) Running Parallel EARTH (earexe) under PVM

After exiting from the PVM console, you are ready to run EARTH on the parallel machine. We have defined six sub-domains and our virtual machine must have at least six nodes available. In the directory ../PHOENICS/d_priv1, set a PHOENICS environment variable (PHOENPROCS) that specifies the number of nodes to be used; in this example we will set it to 6, i.e.

setenv PHOENPROCS 6

Note that each time you want to use different number of Processors, you have to halt PVM, set a new value to PHOENPROCS and re-start PVM again, after you have halted.

From the directory ../PHOENICS/d_priv1 and the UNIX prompt type runear to execute EARTH on six processors.

The message PHOENICS starts by calling PVM3IN... is printed six times showing that all six nodes have started running PHOENICS. Note that in some systems this message will be printed only once, since only one processor has the console as the standard output; to check the execution of the other processors look at the end of the pvm-log-file, by typing the command (from another window):

tail -f /tmp/pvml.


PHOENICS starts by calling PVM3IN...
PHOENICS starts by calling PVM3IN...
PHOENICS starts by calling PVM3IN...
PHOENICS starts by calling PVM3IN...
PHOENICS starts by calling PVM3IN...
PHOENICS starts by calling PVM3IN...

The next message gives information about the sub-domains, during the splitting procedure. The program has detected six sub-domains in the z-direction, and none in the other directions; in total six sub-domains were detected.


There are no S-DOMAIN settings in X-direction.
There is no S-DOMAIN settings in Y-direction.
There are 6 sub-domains in Z- direction.
There are 6 S-DOMAINs in the Q1 file.

After the splitting of the whole domain into sub-domains, each processor starts the execution of EARTH as usual, using information and boundary conditions that correspond to its local domain. The following output is the standard EARTH output, but it is only Processor 0 (the parent processor) that performs the I/O.

The ID of this node is This code is valid for a limited period For the exact date of expiring see the line below the CHAM logo which contains the text "The code expiry date is the end of : ". To continue to use PHOENICS after that date, send a completed Unlocking Request Form (including the node ID shown above) to CHAM.


   ---------------------------------------------------------

--------------------------------------------------------- CCCC HHH PHOENICS Version 3.1 - EARTH CCCCCCCC H (C) Copyright 1997 CCCCCCC See H Concentration Heat and Momentum Ltd CCCCCCC the new H All rights reserved. CCCCCC PLANT H Address: Bakery House, 40 High St CCCCCCC feature H Wimbledon, London, SW19 5AU CCCCCCC PLNTDEM H Tel: 0181-947-7651 CCCCCCCC H Facsimile: 0181-879-3497 CCCC HHH E-mail: phoenics@cham.co.uk --------------------------------------------------------- This program forms part of the PHOENICS installation for: CHAM The code expiry date is the end of : Sep 2007 ---------------------------------------------------------

PHOENICS PHOENICS PHOENICS PHOENICS PHOENICS


Version 3.1; 01 Oct 1997
EARDAT has been read for IRUN= 1 LIBREF= 100
GREX3 OF HAS BEEN CALLED
GROUND file is GROUND.F of: 140796
Number of F-array locations available is 8000000
Number used before BFC allowance is 7279
Number used after BFC allowance is 7279
--- INTEGRATION OF EQUATIONS BEGINS ---
TIME STEP = 1 SWEEP = 1
TOTAL RESIDUAL/( 3.253E-05) FOR P1 IS 4.047E+05
TOTAL RESIDUAL/( 4.333E-10) FOR V1 IS 1.552E-01
TOTAL RESIDUAL/( 2.233E-04) FOR W1 IS 2.009E+10
TOTAL RESIDUAL/( 1.374E-04) FOR KE IS 1.080E+10
TOTAL RESIDUAL/( 1.127E+05) FOR EP IS 9.859E+18
TIME STEP = 1 SWEEP = 2


TOTAL RESIDUAL/( 4.867E-04) FOR P1 IS 3.268E+04
TOTAL RESIDUAL/( 3.289E-04) FOR V1 IS 1.769E+11
TOTAL RESIDUAL/( 1.510E-03) FOR W1 IS 5.428E+05
TOTAL RESIDUAL/( 2.264E-04) FOR KE IS 7.394E+05
TOTAL RESIDUAL/( 2.055E+04) FOR EP IS 1.641E+05
TIME STEP = 1 SWEEP = 3
TOTAL RESIDUAL/( 7.029E-04) FOR P1 IS 2.642E+04
TOTAL RESIDUAL/( 8.457E-04) FOR V1 IS 2.349E+05
TOTAL RESIDUAL/( 2.595E-03) FOR W1 IS 2.025E+05
TOTAL RESIDUAL/( 2.809E-04) FOR KE IS 3.568E+05
TOTAL RESIDUAL/( 3.125E+02) FOR EP IS 1.379E+04
TIME STEP = 1 SWEEP = 4
TOTAL RESIDUAL/( 8.407E-04) FOR P1 IS 2.031E+04
TOTAL RESIDUAL/( 9.248E-04) FOR V1 IS 9.090E+04
TOTAL RESIDUAL/( 3.130E-03) FOR W1 IS 1.345E+05
TOTAL RESIDUAL/( 3.888E-04) FOR KE IS 4.224E+05
TOTAL RESIDUAL/( 2.256E+00) FOR EP IS 1.153E+04

After the last sweep has been reached, each processor sends the computed field data to the parent processor which assembles and writes them to the disk file phi.

The phi file contains the results from the whole domain.

(e) Running Parallel EARTH (earexe) under MPI

Running earexe under MPI, may vary from system to system. The most standardised way of running is by invoking the mpirun command. The format is as follows:

mpirun -np 4 lp31/d_earth/earexe

in order to run on 4 processors.

In the d_priv1 there are the scripts runear (for single-processor run) and runear.2 for two-processor runs.

In some systems, like the Futjitsu VPP300 or NEX SX-4, the job must be submitted to a queue, using the sub command. Ask your system administrator on how to submit jobs to your system.

(f) Recompiling to produce earexe under PVM

A new earexe can be produced by using the bldear script provided in ../PHOENICS/d_earth/ directory. First create a link, using the script make_link which can be copied to the current directory from /PHOENICS/d_modpri.

You can recompile main.f or ground.f, by using the compf script supplied in /PHOENICS/d_earth.

Since Parallel EARTH makes calls to PVM functions and subroutines, the object files and libraries produced during compilation must be linked together with the PVM in order to create the executable earexe.

Each user must edit the bldear in order to point to the right directory where the PVM libraries reside. If you installed PVM according to the instructions of TR/110/PAR you need not change anything.

(g) Recompiling to produce earexe under MPI

A new earexe can be produced by using the bldear script provided in ../PHOENICS/d_earth/ directory. First create a link, using the script make_link which can be copied to the current directory from /PHOENICS/d_modpri.

You can recompile main.F and ground.f, by using the mpicompF and compf script respectively, supplied in /bin and also copied in /PHOENICS/d_earth.

Since Parallel EARTH makes calls to MPI functions and subroutines, the object files and libraries produced during compilation must be linked together with the MPI in order to create the executable earexe.

Each user must edit the bldear in order to point to the right directory where the MPI libraries reside.

Furthermore, due to some differences in systems, users are advised to check the options used in the bldear, mpicompF and compf supplied with this installation for EARTH to ensure that the options are correct for their system.

Some known differences are listed below:

(I) For HP K260 systems

Since the installation supplied was created on an HP D270, users are advised to check whether the option +U77 is compatible with their K260 system.

HP has confirmed that the systems are binary compatible and users should not experience any problems.

Users are advised to check that the MPI is installed properly on their K260 systems.


   #!/bin/sh
   echo 'Building PHOENICS EARTH executable...         '
   echo '-------------------------------------         '
   echo '                                              '
   #
   #
   mpif77  +U77 /
   main.o /
   ground.o /
   lp31/d_earth/d_core/*.o  /
   lp31/d_earth/d_spe/specdum.o  /
   lp31/d_earth/d_opt/d_advmph/*.o /
   lp31/d_earth/d_opt/d_bfc/*.o  /
   lp31/d_earth/d_opt/d_mfm/*.o  /
   lp31/d_earth/d_opt/d_asap/*.o  /
   lp31/d_earth/d_opt/d_chem/*.o /
   lp31/d_earth/d_opt/d_gentra/*.o  /

lp31/d_earth/d_opt/d_gentra/genlib.a / lp31/d_earth/d_opt/d_mbfgem/*.o / lp31/d_earth/d_opt/d_numalg/*.o / lp31/d_earth/d_opt/d_rad/*.o / lp31/d_earth/d_opt/d_solstr/*.o / lp31/d_earth/d_opt/d_turb/*.o / lp31/d_earth/d_opt/d_twophs/*.o / lp31/d_chemkin/cklib.o / lp31/d_chemkin/dmath.o / lp31/d_chemkin/eqlib.o / lp31/d_chemkin/stanlib.o / lp31/d_chemkin/tranlib.o / lp31/d_earth/d_core/corlib.a / lp31/d_earth/d_core/mpilib.a /

lp31/d_allpro/d_graphi/pgralib.a / lp31/d_allpro/d_earsat/pesalib.a / lp31/d_allpro/d_graphi/pdrilib.a / lp31/d_allpro/d_filsys/pfillib.a / lp31/d_allpro/d_graphi/pgralib.a / lp31/d_allpro/d_filsys/psyslib.a / -lX11 -lmpi / -o earexe chmod +x earexe

(II) For FUJITSU VPP300 systems

Since the installation supplied was created on a Fujitsu VX-4, users are advised to check whether the option shown below are compatible with their VPP300 system.

Fujitsu has confirmed that the systems are binary compatible and users should not experience any problems, although they need to check the options shown below, and substitute the paths specified for the libraries used by PHOENICS..

Users are advised to check that the MPI is installed properly on their VPP300 systems.


   #!/bin/sh
   echo 'Building PHOENICS EARTH executable...         '
   echo '-------------------------------------         '
   echo '                                              '
   #
   #
   frt  -Wl, -P, -J, -dy -t -L/usr/lang/mpi/lib -L/opt/tools/lib  /
   main.o /
   ground.o /
   lp31/d_earth/d_core/*.o  /
   lp31/d_earth/d_spe/specdum.o  /
   lp31/d_earth/d_opt/d_advmph/*.o /
   lp31/d_earth/d_opt/d_bfc/*.o  /
   lp31/d_earth/d_opt/d_mfm/*.o  /

........................................... lp31/d_allpro/d_filsys/pfillib.a / lp31/d_allpro/d_graphi/pgralib.a / lp31/d_allpro/d_filsys/psyslib.a / -lX11 -lsocket -lnsl -lmpi -lmp2 -lelf -lpx / -o earexe chmod +x earexe

(h) Setting array dimensions and PARAMETERs in main.f

A number of arrays have been defined and used for the parallel version. For economy on the memory the dimensions of these arrays should be set according to the problem size you are planning to solve.

A description of these arrays is given and the criteria for changing to the appropriate size are presented.

NDD - This indicates how much computer memory must be reserved for the variables used in the Parallel Linear Equation Solver in double precision (associated array FDP).

The user may alter it according to the maximum number of grid cells in use per sub-domain, including the overlapped cells, ie NDD = 10 + 9 * NSX * NSY * NSZ.

Note that NSX, NSY, NSZ are the number of cells in x-, y- and z-directions respectively, for each sub-domain, including the overlapped cells.

Since these numbers may not be the same for all sub-domains we use the maximum NSX, NSY and NSZ.

Example:

For the case test-1a.q1 running on six processors NSX=NX=1, NSY=NY=20.

For NSZ we take the largest number of slabs corresponding to a single sub-domain (in this case was 8 for sub-domains 1 and 7; see section 4(b)) and we add another four required by the overlap cells; therefore NSZ=8+4=12. We can set NDD=2170.

NGX, NGY, NGZ - They represent the global size of the grid, ie for the case test-1a.q1, NGX=NX=1, NGY=NY=20, NGZ=NZ=45.

NBFC - This parameter is used only when a BFC problem is considered, and it represents the total number of corner coordinates ( NBFC = (NGX+1) * (NGY+1) * (NGZ+1) ). If you run a non-BFC problem and you wish to modify NBFC, you should set it at least to, NBFC=8.

NFSAT - represents the dimensions of the auxilliary array FSAT which holds data read from the eardat and are used during the domain splitting. The size of NFSAT has been set to 50000 and should not be reduced from this value even if a smaller value is used. NFSAT can be estimated as NFSAT = MAXTCV + MAXFRC where MAXTCV and MAXFRC are set in satlit.f.

NSD,NXP,NYP, NZP- These parameters should not be changed from the set values (NSD=512, NXP=200, NYP=200, NZP=200)

NEXZ - represents the number of overlapped cells in the z-direction which are used for the data exchange between neighbouring processors in that direction (ie NEXZ = 2*NSX*NSY). Example: For the case test-1a.q1, we can set NEXZ = 2*1*20=40.

NEXY - represents the number of overlapped cells in the y-direction which are used for the data exchange between neighbouring processors in that direction (ie NEXY = 2*NSX*NSZ).

NBUF - This is the size of the buffers used for the data exchange between neighbouring processors for those variables solved slabwise only. The parameter NBUF can be set according to the formula NBUF = NEXZ * (Number of Variables solved Slabwise).

5 New functionalities

(a) phi files

The phi files created during a parallel run are sequential files. Make sure that the option PHIDA=F is set in the config file, in the directory ../PHOENICS/d_allpro. The reason for not using direct-access (phida) files is to avoid conflicts between different architectures when a heterogeneous system is used.

(b) RESTRT

For the parallel version the RESTRT option is functioning slightly different in order to accommodate the creation of large phi files. For very large problems the phi file can be over 100 Mbytes in size, created from the assembly of data from a number of sub- domains.

This file or even larger files, will not fit in the memory of a single processor during a restart run, where the initial fields of the specified variables are read from the phi file. In the sequential version restart files (phi or phida) are created in runs when SAVE=T (which is the default).

For the parallel version restart files are created for each sub-domain, by each individual processor when we set LG(1)=T in the q1 file or the MAIN program.

This flag forces each processor to create its own restart file (phi file only) for its own sub-domain with a distinct name specifying the processor that created it.

After saving the restart files, we can use the RESTRT command as usual.

If we want to save the field data in one phi file, for the entire domain, then you set the flag LG(1)=F in the q1 file or remove it altogether (LG(1)=F is the default).

Example

If there are 6-processors available for a run, 6 phi files will be created with the names phi001, phi002, phi003, phi004, phi005, phi006. The numbers are used so each processor will know which file to read when entering a RESTART run.

Hence, phi001 was created by processor 0, phi002 by processor 1, and phi006 by processor 5. The numbers do not necessarily correspond to the respective sub-domain number.

(c) TSTSWP

For performance purposes, the graphical interface has not been implemented.

You should always set TSTSWP so that the text mode is in use (ie TSTSWP=1 or 2 etc.).

Also set UWATCH=F for better performance.

(d) result file

The result file created during a multi-processor run contains information only for the first sub-domain (processor 0) and not for the entire domain covered by all processors.

However, the RESIDUALs and MONITOR POINT plots/tables refer to the entire domain. In order to print information for the whole domain, the AUTOPS facility can be used.

For more details of the "autopsy" mode refer to TR200a. A q1 file is modified below to illustrate how one can create a RESULT file with tabulated data for the whole domain.

Example

If you followed the instructions of this manual, you will have in you private directory a phi file from the earlier run.

The result file however contains tabulated data from the first domain only (corresponding to Processor 0).

First save the current result file, if you wish to keep the residual and monitor point plots.

To run EARTH in the autopsy mode, add the following line at the end of your q1 file, before the STOP command:

AUTOPS=T;RESTRT(ALL);SAVE=F

Next execute the following: runsat to run SATELLITE

To run EARTH using one processor only, follow the instructions of Section 6.

EARTH will perform two sweeps and print the data, read from the phi file, on the new result file in tabulated form. Profile and contour plots can also be created by suitable insertion of PATCH and PLOT commands.

(e) Writing your own GROUND coding

Advice on what you need to consider when writing your own code and the list of communication routines which can be used, is under preparation and will be sent to the users as soon as it becomes available.

6 Running in Sequential mode

The Parallel version does not support all the functionalities of the sequential version (a list is given below), however the Parallel version is equivalent to the sequential when it runs on one processor and therefore supports all the options.

PVM.

Below it is demontrated how to run in sequential mode, when a PVM implementation is used.

  1. Go to the directory ../PHOENICS/d_priv1 and set the environment variable PHOENPROCS to 1, i.e.

    setenv PHOENPROCS 1

    Make sure that you halt PVM first and then re-start after you have set the new value for PHOENPROCS.

  2. Examine the properties table props and if the last line is ENDPROP remove it from the file. Please note that you should put this flag back when you wish to run with more than one processor again, otherwise you cannot run in parallel.

  3. Examine your q1 file in the ../PHOPENICS/d_priv1 and if the last line is ENDPROP remove it from the file. Similarly, note that you should put this flag back when you wish to run with more than one processor again, otherwise you cannot run in parallel.

  • Run EARTH as usual from you private directory, using the same script (ie runear).

    MPI.

    For MPI, just use the runear script, in d_priv1, which should contain the following command-line:

    mpirun -np 1 lp31/d_earth/earexe

    7 Functionalities not supported

    The following options are not supported currently by parallel PHOENICS:

    • multi-blocking and FGE
    • advanced numerical algorithms (CCM, HOCS)
    • advanced multi-phase flows
    • GENTRA

    CHAM is working towards supporting these options.

    References

    [1] N.D. Baltas, D.B. Spalding. MIMD PHOENICS: Porting a Computational Fluid Dynamics Application to a Distributed Memory MIMD Computer, in "Massively Parallel Processing Applications and Development",

    Eds L.Dekker, et.al., Elsevier, Amsterdam, 1994.

    wbs