Here I'll post my notes on SIESTA compilation on AMD Opteron machine using Portland compiler (pgf90/pgf95). However, you may also make use of them while doing your compilation with other compiler/architecture. I do not guarantee anything just that everything written below worked for me. However, this doesn't mean that it will work for you. More, if after following my instructions you hard disk will somehow get formatted, dont blame me - your using my advices at your own risk. I also have unfinished version of similarly notes of SIESTA compilation on Intel machine with ifort (an extended version of ones by Sebastien Le Roux) but I still don't have time to finish and publish them. If I do, I'll post the link here also. No instructions on compiling Mpich are given because I assume you're using cluster which has mpich already installed. If this is not the case, I refer you to
.
What is needed to compile parallel SIESTA?
SIESTA distribution
Compartible Fortran and C compilers (Intel ifort and icc for intel machine, pgf95/pgcc for AMD etc)
MPI (Open MPI/ MPICH/LAM etc) – compiled with the same compiler which will be used for SIESTA compilation. Actually, it is already installed on clusters and supercomputers, so you'd check available MPI and the supporting compilers. If mpich2 and mpich are available, I'd rather recommend using mpich instead of mpich2 – I remember having some problems with the latter.
BLAS nad LAPACK – can be downloaded from netlib.org or (which is better) use ACML or Intel MKL instead. Also you may try Goto BLAS but I haven't manage to compile SIESTA with GOTO.
BLACS - should be compiled from scratch with the same options and the same compiler as SIESTA.
SCALAPACK – should be compiled from scratch after compilation of LAPACK/BLAS and BLACS.
Examples below will actually be related to Opteron Machine with Portland compiler (accessible as pgf95 for Fortran and pgcc for C), but the idea could be transferred to another compilers and architectures.
Option 1. Compiling BLAS and LAPACK from scratch
This works almost everywhere, but less efficient than using specific libraries optimized for specific processors (like ACML, Intel MKL or so). This option is described also in Sebastién Le Roux's HOW-TO, but for Intel machine. I repeat almost the same but for AMD machine which makes almost no difference. Also some things from Sebastién's HOWTO didn't work for me, so I'll mention them also.
To compile BLAS and LAPACK you only need a lapack.tar.gz file with LAPACK, so BLAS is included in this package downloadable from netlib.org.
BLAS AND LAPACK
Here is my make.inc (comments are skipped, my comments are written in italic):
PLAT = _LINUX
FORTRAN = pgf95 The compiler which will be used for SIESTA compilation. On Intel machine it could be ifort. Also you can use pgf90 instead.
OPTS = -g -O0 This indicates that we do not want optimization. Generally, the options should be the same as used for SIESTA compilation, but not necessary. You may want to change to -fastsse or -O2 or smth else available via man pgf95 command. I do not recommend to use -fast flag here, but you may try. Anyway never use it for SIESTA compile, better use -fastsse instead – I'll speak on it later.
DRVOPTS = $(OPTS)
NOOPT =
LOADER = pgf95 Generally, a loader should be the same as compiler indicated in FORTRAN section
LOADOPTS =
ARCH = ar
ARCHFLAGS= cr
RANLIB = ranlib
BLASLIB = ../../blas$(PLAT).a
LAPACKLIB = lapack$(PLAT).a
TMGLIB = tmglib$(PLAT).a
EIGSRCLIB = eigsrc$(PLAT).a
LINSRCLIB = linsrc$(PLAT).a
Variables not commented generally were not changed by me and left default. To compile type:
make blaslib
make lapacklib
After that the libraries .a will be found in the current direcrory. I suggest to rename them into libblas.a and liblapack.a and move somewhere to ~/siesta/lib directory (which you of course should create).
BLACS
Take some MPI file from BMAKES directory and use as a template for Bmake.inc. I'll post here only the lines which have to be modified:
BTOPdir = ~/BLACS Current directory, where BLACS was unpacked
COMMLIB = MPI
PLAT = LINUX
MPIdir = /opt/mpich/pgi62_64 Nota bene! Mpich should be compiled with the same compiler version and architecture as we will use! So you should do if you'll have to compile Mpich by yourself!
MPIdev = ch_p4
MPIplat = LINUX
MPILIBdir = $(MPIdir)/$(MPIdev)/lib Check attentively where your libmpich.a (or equivalent like libmpi.a) is located, this pparameter should indicated the directory where it is stored
MPIINCdir = $(MPIdir)/$(MPIdev)/include path to MPI include libs like mpi.h or so.
MPILIB = $(MPILIBdir)/libmpich.a full path to libmpich.a, check if it is correct and such a library exists
INTFACE = -Dadd_ I don't really know what this option is responsible for, but setting it to -Dadd_ worked for me all the times.
F77 = pgf95
F77NO_OPTFLAGS = -g -O0 Compiler flags for files which should not be compiled with optimization
F77FLAGS = $(F77NO_OPTFLAGS) This can and should be changed, I'm just doing testing compilations without any optimization. As for LAPACK, such flags should be (but not necessary) the same as for SIESTA compilation. Also, in my benchmark tests, the change of such flags in BLACS didn't influence the overall SIESTA performance at all. But it doesn't meant that you should not try to optimize BLACS at all.
F77LOADER = $(F77)
F77LOADFLAGS =
CC = pgcc C compiler, should be fully compartible with Fortran compiler. As for Portland it should be of the same architecture and version as pgf95. So is true for Intel compilers.
CCFLAGS = $(F77FLAGS) In general C compiler flags should dublicate the fortran flags, but you may play with them also.
CCLOADER = $(CC)
CCLOADFLAGS =
So to compile just type: make mpi.
NOTA BENE: Generally, a command make clean cleans the previous compilation, so you can compile from the very beginning if you want to. This should be done after every change in makefiles and before the following recompilation. But in case of BLACS you should type make mpi what=clean instead of make clean which won't work.
After compilation you'll find the compiled libraries in LIB subdirectory. I suggest also to move them into smth like ~/siesta/lib and rename to libblacsF77init.a, libblacsCinit.a, libblacs.a/
SCALAPACK
Here is my SLmake.inc (it's editable part, actually) with comments:
home = ~/scalapack-1.7.5 A path where SCALAPACK was unpacked
PLAT = LINUX
BLACSdir = ~/siesta/lib A directory, where BLACS libraries are located, compiled before.
USEMPI = -DUsingMpiBlacs
SMPLIB = /opt/mpich/pgi62_64/lib/libmpich.a Full path to mpich
I've set the wrong path to the BLACS libraries (three lines below), but it all still worked for me! So it seems SCALAPCK do not generally requires BLACS or it somehow finds the libraries automatically via BLACSdir.
BLACSFINIT = $(BLACSdir)/libblacsF77init.a
BLACSCINIT = $(BLACSdir)/libblacsCinit.a
BLACSLIB = $(BLACSdir)/libblacs.a
For compilers and options see the same comments for BLACS and LAPACK compilation
F77 = pgf95
CC = pgcc
NOOPT = -g -O0
F77FLAGS = $(NOOPT)
DRVOPTS = $(F77FLAGS)
CCFLAGS = $(NOOPT)
SRCFLAG =
F77LOADER = $(F77)
CCLOADER = $(CC)
F77LOADFLAGS =
CCLOADFLAGS =
CDEFS = -DAdd_ -DNO_IEEE $(USEMPI)
BLASLIB = ~/siesta/lib/libblas.a Path to previously compiled BLAS lib.
So after running make you'll find libscalapack.a which I would also copy to ~/siesta/lib or so.
At last: compiling SIESTA
So here is my arch.make :
SIESTA_ARCH=pgf95-natanzon
FC=pgf95
FFLAGS= -g -O0 We are compiling SIESTA without any optimization. You can (and I advise this) to play with optimization flags and not to stop at this options, because they are slow, but guarantee the succesfull compilation, which is more important at the time. You can also set this parameter to -fastsse or so (see man pgf95 for details). Problems were reported while using -fast flag, so I DO NOT recommend to use this flag – it may cause crashes in SIESTA runs.
FPPFLAGS= -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
FFLAGS_DEBUG= -g -O0 This normally should not be changed – some SIESTA files should be compiled without optimization
RANLIB=echo
#
NETCDF_LIBS=
NETCDF_INTERFACE=
DEFS_CDF=
#
MPI_INTERFACE=libmpi_f90.a This also normally should not be changed
MPI_INCLUDE=/opt/mpich/pgi62_64/include path to MPI include dir
MPI_LIBS=/opt/mpich/pgi62_64/lib path, where libmpich.a is located
DEFS_MPI=-DMPI Option for compiling parallel version of SIESTA. Necessary!
#
HOME_LIB=~/siesta/lib Path where previuosly compiled by us libraries (BLAS, LAPACK, BLACS, SLALAPCK) are located
LAPACK_LIBS= -llapack There are two ways of indicating library path: describe full filename like liblapack.a or describe it in the way presented here. The linker will automatically replace -l by lib and add .a extension.
BLAS_LIBS=-lblas
BLACS_LIBS = lblacsCinit blacsF77init -lblacs
SCALAPACK_LIBS = -lscalapack
LIBS= -L$(HOME_LIB) $(SCALAPACK_LIBS) $(BLACS_LIBS) $(LAPACK_LIBS) $(BLAS_LIBS) $(NETCDF_LIBS)
SYS=bsd
DEFS= $(DEFS_CDF) $(DEFS_MPI)
#
.F.o:
$(FC) -c $(FFLAGS) $(INCFLAGS) $(DEFS) $<
.f.o:
$(FC) -c $(FFLAGS) $(INCFLAGS) $<
.F90.o:
$(FC) -c $(FFLAGS) $(INCFLAGS) $(DEFS) $<
.f90.o:
$(FC) -c $(FFLAGS) $(INCFLAGS) $<
Two compile SIESTA just type make. I expect everything should be ok.
What else? Adding optimization
This should be done from the very beginning but sometimes overoptimized SIESTA doesn't work and it is safe to compile without any optimization first and be sure that everything works. For this I recommend to run some SCF tasks (without cell relaxation, 0 CG moves) to be sure the parallelization works (benchmarks will be presented below).
To optimize you should replace F77FLAGS and CCFLAGS everywhere in the arch.make and supporting libraries and recompile everything from the beginning. So for pgf compiler I recommend to use this one:
F77FLAGS = -fastsse
I think this is the best can be done. On Intel machine -O3 -ttp7 (or so) flag should be used instead. It gives the most good optimization. Also try -fast flag on Intel machine – the compiler should automatically discover the best optimization flags, but it does smth wrong for PGF. So never use it with pgf, but this is, say, N-th time I'm repeating this. Also some additional changes should be made into arch.make :
atom.o: This file and the following should be still compiled without optimization
$(FC) -c $(FFLAGS_DEBUG) atom.f
#
dfscf.o:
$(FC) -c $(FFLAGS_DEBUG) $(INCFLAGS) dfscf.f
#
#
electrostatic.o:
$(FC) -c $(FFLAGS_DEBUG) electrostatic.f
#
.F.o:
$(FC) -c $(FFLAGS) $(INCFLAGS) $(DEFS) $<
.f.o:
$(FC) -c $(FFLAGS) $(INCFLAGS) $<
.F90.o:
$(FC) -c $(FFLAGS) $(INCFLAGS) $(DEFS) $<
.f90.o:
$(FC) -c $(FFLAGS) $(INCFLAGS) $<
Option 2. Linking SIESTA with external libraries (ACML, Intel MKL)
What for? My benchmarks show, that using ACML library on AMD64 machine gradually increases the overall SIESTA performance, so it's worth thinking of linking external libraries like ACML, Intel MKL, GotoBLAS etc.
Note: I haven't manage to link Intel MKL by now, however it doesn't mean this can't be done. Keep trying. So, I just share my experience of linking ACML here. I think another libraries can be linked in the same way.
What is needed?
ACML library installed. If you don't have one, just download, unpack and run install-blablabla.sh script, which will do everythin automatically, you only have to provide a path where to install. I suppose my ACML to be installed in ~/acml.
Precompiled BLACS according to instructions in Option 1 chapter.
Sources of SCALAPACK and of course SIESTA, those two should be recompiled with ACML support.
No LAPACK/BLAS are needed at this time, because ACML will replace them.
SCALAPACK
Compared to the previous chapter, only one change should be made into Slmake.inc file, which shows the path to BLAS:
BLASLIB = ~/acml/pgi64/lib/libacml.a A full path to libacml.a Be aware to provide the correct vesrion (32 or 64 bit) compartible with compiler.
SIESTA
Here I'll give you my arch.make file again, but the comments will be given only for lines which have changed.
SIESTA_ARCH=pgf95-matterhorn
#
FC=pgf95
FC_ASIS=$(FC)
#
FFLAGS= -fastsse -Wl,-R/opt/pgi/linux86-64/6.2/libso/ -Wl,-R~/acml/pgi64/lib/
You see additional flags are being passed to compiler. -fastsse indicates optimization, while two other paths tell using ACML and PGI libraries which both are necessary for correct linking
FFLAGS_DEBUG= -g O0
LDFLAGS= -fastsse -Bstatic The second flags tells to statically link the libraries so you won't have troubles with indicating LD_LIBRARY_PATH before SIESTA run. If you have troubles with linking, try to delete -Bstatic flag and recompile. But in this case you should execute
export LD_LIBRARY_PATH=/opt/pgi/linux86-64/6.2/libso
before SIESTA run or otherwise you'll get errors like: Cannot load shared libraries XXX.so
Note, that -Bstatic flag is called -static for ifort. Also the linking can crash for Intel Compiler with such a flag, but for pgf it works fine.
RANLIB=echo
COMP_LIBS= #dc_lapack.a Do not use internal LAPACK of SIESTA!
MPI_ROOT=/opt/mpich/pgi62_64/ch_p4
MPI_LIBS= -L$(MPI_ROOT)/lib -lmpich
ACML = -mp -L~/acml/pgi64/lib -lacml -Wl,-R/opt/pgi/linux86-64/6.2/libso/ -Wl,-R~/acml/pgi64/lib/
First parameter is a path to ACML and two others simply repeat the compiler flags. I don't know what -mp is for.
MPI_INTERFACE=libmpi_f90.a
MPI_INCLUDE=/opt/mpich/pgi62_64/ch_p4/include
DEFS_MPI=-DMPI
#
SCALAPACK = -L~/siesta/lib -lscalapack
BLACS = -L~/siesta/lib -lblacs -lblacsF77init -lblacsCinit -lblacsF77init -lblacs
NUMA = -L/opt/pgi/linux86-64/6.2/libso -lnuma This library is needed for static linking. If you link dynamically (no -Bstatic flag) you do not need this library.
LIBS= $(SCALAPACK) $(BLACS) $(ACML) $(MPI_LIBS) $(NUMA)
SYS=cpu_time
DEFS= $(DEFS_CDF) $(DEFS_MPI)
#
#
# Important (at least for V5.0-1 of the pgf90 compiler...)
# Compile atom.f and electrostatic.f without optimization.
#
atom.o:
$(FC) -c $(FFLAGS_DEBUG) atom.f
#
dfscf.o:
$(FC) -c $(FFLAGS_DEBUG) $(INCFLAGS) dfscf.f
#
#
electrostatic.o:
$(FC) -c $(FFLAGS_DEBUG) electrostatic.f
#
.F.o:
$(FC) -c $(FFLAGS) $(INCFLAGS) $(DEFS) $<
.f.o:
$(FC) -c $(FFLAGS) $(INCFLAGS) $<
.F90.o:
$(FC) -c $(FFLAGS) $(INCFLAGS) $(DEFS) $<
.f90.o:
$(FC) -c $(FFLAGS) $(INCFLAGS) $<
So, voila! That's all I can tell you on parallel SIESTA compilation. I've also done some benchmarks, but I decided not to include them here. Just ask me in the comments, if you want a pdf-file of these notes with benchmark tests on parallelization.