Monday, November 12, 2012

Triangular region in LAMMPS

I tried to use Steve Plimpton's  code for triangular region style, however it doesn't seem to work with the recent version of LAMMPS. I did minor modifications and publish them here.

src/region_triangle.h
#ifdef REGION_CLASS
RegionStyle(triangle,RegTriangle)
#else

#ifndef LMP_REGION_TRIANGLE_H
#define LMP_REGION_TRIANGLE_H

#include "region.h"

namespace LAMMPS_NS {

class RegTriangle : public Region {
 public:
  RegTriangle(class LAMMPS *, int, char **);
  ~RegTriangle();
  int inside(double, double, double);
  int surface_interior(double *, double);
  int surface_exterior(double *, double);

 private:
  double x1,y1,x2,y2,x3,y3;
};

}

#endif
#endif


src/region_triangle.cpp
#include "stdlib.h"
#include "string.h"
#include "region_triangle.h"
#include "error.h"

using namespace LAMMPS_NS;

#define MIN(a,b) ((a) < (b) ? (a) : (b))
#define MAX(a,b) ((a) > (b) ? (a) : (b))

/* ---------------------------------------------------------------------- */

RegTriangle::RegTriangle(LAMMPS *lmp, int narg, char **arg) :
  Region(lmp, narg, arg)
{
  options(narg-8,&arg[8]);

  x1 = xscale*atof(arg[2]);
  y1 = yscale*atof(arg[3]);
  x2 = xscale*atof(arg[4]);
  y2 = yscale*atof(arg[5]);
  x3 = xscale*atof(arg[6]);
  y3 = yscale*atof(arg[7]);

  // extent of 2d triangle

  extent_xlo = MIN(x1,x2);
  extent_xlo = MIN(extent_xlo,x3);
  extent_xhi = MAX(x1,x2);
  extent_xhi = MAX(extent_xhi,x3);

  extent_ylo = MIN(y1,y2);
  extent_ylo = MIN(extent_ylo,y3);
  extent_yhi = MAX(y1,y2);
  extent_yhi = MAX(extent_yhi,y3);

  extent_zlo = -0.5;
  extent_zhi = 0.5;

  cmax = 1;
  contact = new Contact[cmax];
}

 RegTriangle::~RegTriangle()
{
  delete [] contact;
}
 
/* ---------------------------------------------------------------------- */
 
int RegTriangle::inside(double x, double y, double z)
{
  double side1 = (x-x1)*(y2-y1) - (y-y1)*(x2-x1);
  double side2 = (x-x2)*(y3-y2) - (y-y2)*(x3-x2);
  double side3 = (x-x3)*(y1-y3) - (y-y3)*(x1-x3);
 
  if (side1 > 0.0 && side2 > 0.0 && side3 > 0.0) return 1;
  return 0;
}

/* ------------------------------------------------------------------------ */
int RegTriangle::surface_interior(double *x, double cutoff)
{
  if (RegTriangle::inside(x[0], x[1], x[2])==1)
   {
    return 1;
   }
  return 0;
}

/* ------------------------------------------------------------------------ */

int RegTriangle::surface_exterior(double *x, double cutoff)
{
  if (RegTriangle::inside(x[0], x[1], x[2])==1)
   {
    return 0;
   }
  return 1;
}


Notes and compilation
Functions surface_interior() and surface_exterior() are written for compatibility reasons and don't have much sense in present form. They are needed for fix/wall, so you should modify them if you want to use that fix. If not (as myself), just don't care.

Compilation is straightforward: put these two files in src directory and recompile LAMMPS. No further modification of the code is needed.

Example of use
Hex lattice with triangular void:

dimension       2
boundary        f m p

atom_style      atomic
lattice         hex 0.93
region          box block 0 300 0 500 -0.25 0.25
create_box      1 box
create_atoms    1 box
mass            1 1.0
region          lower block INF 18 INF INF INF INF units lattice
region          upper block 282 INF INF INF INF INF units lattice
group           lower region lower
group           upper region upper
group           boundary union lower upper
group           mobile subtract all boundary

# void
region          notch triangle 150 0 153 150 156 0
delete_atoms    region notch

Monday, October 1, 2012

Compiling LAMMPS with Intel Composer and OpenMPI

A typical problem which one faces trying to compile LAMMPS using Intel Composer XE 2011 is the following link error:
write_restart.o: In function `LAMMPS_NS::WriteRestart::command(int, char**)':
write_restart.cpp:(.text+0x2b9): undefined reference to `__intel_sse2_strchr'
write_restart.cpp:(.text+0x4b1): undefined reference to `__intel_sse2_strchr'
write_restart.cpp:(.text+0x586): undefined reference to `__intel_sse2_strchr'
write_restart.cpp:(.text+0x10e3): undefined reference to `__intel_sse2_strchr'
write_restart.cpp:(.text+0x125a): undefined reference to `__intel_sse2_strcpy'
write_restart.o: In function `LAMMPS_NS::WriteRestart::write(char*)':
write_restart.cpp:(.text+0x28d7): undefined reference to `__intel_sse2_strchr'
write_restart.cpp:(.text+0x29ab): undefined reference to `__intel_sse2_strchr'
write_restart.cpp:(.text+0x34fe): undefined reference to `__intel_sse2_strchr'


And so on. This error was discussed on Intel forums, so I followed the advice by _samuel_ and modified src/Makefile.openmpi as follows:

LINKFLAGS = -O -L/opt/intel/composer_xe_2011_sp1.7.256/compiler/lib/intel64 -lifcoremt -Bstatic -lirc -Bdynamic

This solved the problem and after running make openmpi I finally got a binary. If you don't really know where is your Composer installed, this command will (probably) show the path:
$which icc
/opt/intel/composer_xe_2011_sp1.7.256/bin/intel64/icc

Installing MEAM
To use Modified Embedded Atom Method, one should compile LAMMPS with MEAM package:
make yes-meam

After that go to ../lib/meam and compile MEAM library:
make -f Makefile.ifort

After successful compilation modify Makefile.lammps to specify the proper libraries and path:
meam_SYSINC = 
meam_SYSLIB = -lifcore -lsvml -liompstubs5 -limf
meam_SYSPATH = -L/opt/intel/composer_xe_2011_sp1.7.256/compiler/lib/intel64


Only after that you can compile LAMMPS once again:
make clean-openmpi
make openmpi

Wednesday, May 2, 2012

Installing xmovie tool

xmovie is a simple visualisation tool, which comes with LAMMPS molecular dynamics software.

To install it, one needs just to type make in tools/xmovie directory. However, you can get an error:


xmovie.c:12:28: error: X11/StringDefs.h: No such file or directory
xmovie.c:14:27: error: X11/Intrinsic.h: No such file or directory
xmovie.c:15:26: error: X11/Xaw/Form.h: No such file or directory
xmovie.c:16:31: error: X11/Xaw/Cardinals.h: No such file or directory
In file included from xmovie.c:18:
xmovie.h:65: error: expected specifier-qualifier-list before ‘Bool’
xmovie.h:98: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘TopLevel’
xmovie.h:101: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘CreateScene’
xmovie.h:102: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘CreateControl’
xmovie.h:105: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘ReadProc’
xmovie.h:118: error: expected ‘)’ before ‘*’ token
xmovie.h:122: error: expected ‘)’ before ‘w’
xmovie.h:125: error: expected ‘)’ before ‘w’
xmovie.h:132: error: expected ‘)’ before ‘*’ token
xmovie.h:133: error: expected ‘)’ before ‘*’ token
xmovie.h:134: error: expected ‘)’ before ‘bg’
...

This means, that X11 development packages are not installed. In Ubuntu/Debian systems, you can install them by typing:
sudo apt-get install libx11-dev libxaw7-dev

Then modify Makefile:
XLIBDIR = -L/usr/lib

The path can be different in your distribution, look for the location of libX11.a file. After that, you type make and get in executable.

Tuesday, June 22, 2010

Using Phonopy with Wien2k

In this post I will describe how to calculate various phonon properties from Wien2k calculations, using Atsushi Togo's Phonopy program. As an example, I will use lithium hydride (LiH) which has a simple NaCl structure with a lattice constant of 4.01A (optimized in GGA approximation).

1. LiH has an fcc structure, but phonopy requires the P1 symmetry for Wien2k. So we convert LiH.struct to P1 by running x supercell and creating 1x1x1 cell with a type P. This cell should have 8 atoms - 4 Li and 4 H.

2. Create INPHON file with the following content:
ATOM_NAME = Li H
NDIM = 1 1 1
LSUPER = .TRUE.

The number of dimensions should be larger (e.g. NDIM 2 2 2), but for the testing purposes I've set it to 1 1 1. Then run:
phonopy --wien2k=LiH.struct

This creates the DISP file and several struct files with the supercells (in this particular case - LiH.structS, LiH.structS-001, LiH.structS-002. Having a look at the DISP file:
1 1 0 0
5 1 0 0

And having a look at LiH.structS-* files, one can see, that in these files atom 1 (Li) and atom 5 (H) are displaced by 0.01A in x direction. You can change the displacement manually, if you want.

3. Next step is to run two Wien2k calculations for LiH.structS-001 and LiH.structS-002 structure files, respectively, and save the resulting *scf files.

Important! The init for this calculations should begin from symmetry, e.g.:
init_lapw -b -s symmetry -vxc 13 -numk 100 -sp

This is important to start from symmetry because sgroup could reduce the symmetry, which will make phonopy unable to create a proper FORCES file. Also, the RMT spheres can overlap (check it with x nn!) and should be reduced if needed.

4. Once you have LiH001.scf and LiH002.scf from the above calculations, copy them in the directory with INPHON and DISP files, and run the following to generate FORCES:
phonopy --wien2k=LiH.struct -f LiH001.scf LiH002.scf

The order of scf files in this command should match the order of the atoms in DISP! If everything is ok, than the FORCES file will be saved and you can calculate various properties.

5. Backup INPHON to INPHON.1 and modify it for your needs as described in the phonopy manual. It is important that you comment or remove LSUPER=TRUE line. To calculate phonon DOS the new INPHON may look as follows:
ATOM_NAME = Li H
NDIM = 1 1 1
MP = 21 21 21

The phonon DOS can be calculated by invoking the command:
phonopy --wien2k=LiH.struct -p

References
1. Phonopy web page, where you may find the documentation.
2. Using phonopy with VASP (in Chinese, but readable with Google translate for non-speakers).

Sunday, March 7, 2010

Parallel compilation of Wien2k on Intel Itanium architecture

Here I briefly describe how to compile Wien2k on the supercomputer with Intel Itanium processors. I suppose that you have read the Wien2k UserGuide and know how to compile the serial version of the program.

What do we need:
Intel Fortran Compiler (ifort), I use version 11.0
Gcc or Intel C/C++ compiler (icc)
Intel CMKL libraries (I have used 10.1.1.019 version)
FFTW 2.1.5 (not 3.0!!) compiled with mpi support
MPI (I use MPICH2)

Compiling serial version
For serial version the compilation options given to siteconfig are the following:

System type: K Linux (Intel ifort 11.x compiler + mkl )
Fortran compiler: ifort
C compiler: cc
Compiler options, BLAS and LAPACK:
O Compiler options: -FR -mp1 -w -prec_div -pc80 -pad -align -DINTEL_VML -traceback
L Linker Flags: $(FOPT) -L/opt/intel/cmkl/10.1.1.019/lib/64 -lguide -pthread
P Preprocessor flags '-DParallel'
R R_LIB (LAPACK+BLAS): $(FOPT) -L/opt/intel/mkl/10.1.1.019/lib/64 -lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lguide -pthread

Note: replace the path to your own one, and mind that
lib/64 libraries are needed for Itanium. If you have Intel Xeon or similar processor, use lib/emt64 libraries instead.

Compiling FFTW
This is needed only for the parallel compilation. Download FFTW 2.1.5, unpack it and use the following commands for the compilation:

export CC=icc
export F77=ifort
./configure --enable-mpi --enable-threads
make

Copy the libraries from fftw-2.1.5/fftw/.libs and mpi/.libs somewhere to /opt/fftw or leave them there (as I did).

Compiling in parallel
I have mpich2 installed in /opt/mpich2, so I add /opt/mpich2/bin into the path:

export PATH="$PATH:/opt/mpich2/bin"

Check if mpif90 command works:

$mpif90 -v
mpif90 for 1.0.8p1
Version 11.0

The compilation options will be the following:
Compiler: mpif90

RP RP_LIB(SCALAPACK+PBLAS): -L/opt/intel/mkl/10.1.1.0.19/lib/64 -lmkl_intel_lp64 -lmkl_scalapack_lp64 -lmkl_blacs_lp64 -lmkl_sequential -lmkl -L/opt/mpich2/lib -lmpich -L/home/hartree-fock/fftw-2.1.5/mpi/.libs -lfftw_mpi -L/home/hartree-fock/fftw-2.1.5/fftw/.libs -lfftw
FP FPOPT(par.comp.options): $(FOPT) -I/opt/mpich2/include
MP MPIRUN commando : mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_

That's it! Don't forget to add the necessary lines to .bashrc (LD_LIBRARY_PATH etc).

Friday, November 27, 2009

Intel MKL compilation options

Intel has published the Math Kernel Library Link Line Advisor which helps to find which link line to use for the specific compiler, math library and MPI version.

For example, for Linux on IA64 architecture with Intel compilers, ScaLAPACK and MPICH2 it will generate the following link line:

-L$MKLPATH -lmkl_scalapack_ilp64 $MKLPATH/libmkl_solver_ilp64.a -Wl,--start-group -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -Wl,--end-group -openmp -lpthread

Monday, June 22, 2009

Magic command to prevent Wien2k crash

Sometimes Wien2k program crashes without any reasons and *.error files are empty or just contain the line like "Error in LAPW0". Then run the following command before running Wien2k:

ulimit -s unlimited

It sets the maximum stacks size for the current user unlimited. For clusters and supercomputers you need to add this line to the script which runs Wien2k.

Friday, February 6, 2009

Wien2k Fedora 10 issue

In Fedora 10 the Wien2k program doesn't work at all because of csh/tcsh compatibility issues, giving the error message like this:
argv1: Subscript out of range
The solution to the problem is described on Wien2k mailing list. You either have to install tcsh package from Fedora 8 (you may find one on rpmfind.net), or do the following:
1. Edit x_lapw script and find the following line (should be around the lines 238—248):
echo "`date`> ($name) $command $argv1[2-]" >> $log
2. Comment it and replace with this one:
echo "`date`> ($name) $argv1" >> $log
Now it should work.

Monday, January 12, 2009

Calculate DOS after band structure calculations in Wien2k

Density of States and band structure are calculated on different k point meshes in Wien2k. One mights notice that you cannot calculate density of states after band structure because you get the following error:

forrtl: severe (24): end-of-file during read, unit 29
...
Image PC Routine Line Source
lapw2 0820F1E4 Unknown Unknown Unknown
lapw2 0820D97D Unknown Unknown Unknown
lapw2 081E1CFB Unknown Unknown Unknown
lapw2 081B840A Unknown Unknown Unknown
lapw2 081CC100 Unknown Unknown Unknown
lapw2 080624C0 fermi_ 67 fermi_tmp_.F
lapw2 080881DD MAIN__ 258 lapw2_tmp_.F
lapw2 0804B906 Unknown Unknown Unknown
libc.so.6 00BA1390 Unknown Unknown Unknown
lapw2 0804B811 Unknown Unknown Unknown
0.215u 0.000s 0:00.21 100.0% 0+0k 0+120io 0pf+0w
error: command /home/hartree-fock/wien2k/source/lapw2 uplapw2.def failed


But there is the way to fix this by running lapw1 without a -band switch. For spin-polarized calculation run:

x lapw1 -up
x lapw1 -dn

If you are running non-spinpolarized case than just run "x lapw1". Also, don't forget about the "-orb" switch if you're using some orbital dependent potentials.

After that you'll be able to calculate DOS without any problems.

Monday, March 31, 2008

How to calculate elastic constants with SIESTA?

The answer to this question can of course be found in SIESTA mail list archives. Let me repeat my own answer which has been posted there recently:
SIESTA calculates stress tensor and you know stress/strain relationship from the textbooks. You have to relax your system properly to have your stress tensor be close to zero. Then manually deform lattice vectors in selected directions and obtain the value of stress. For example, if you want to calculate C11, you have to deform lattice vector a (I suppose it is (a,0,0)) to a(1+e), where e is your value of strain. Then you make the calculation with relaxing atomic positions but NOT lattice vectors (VariableCell false) and obtain the value of stress tensor Sigma_ij and find C11:

C11 = Sigma11/e
This gives you the main idea. Here I want to describe this procedure more properly and focus on how to overcome some common problems.

Problem 1. Which value of strain to choose?
It depends on the material. You have to be sure that you're in the linear regime and Hook's law works, so the smaller the strain the better chance that you'll not be mistaken. BUT! Small values of strain mean less accuracy (I'll explain why), so the value of strain shouldn't be too small. I recommend the value of 0.5% or 1%. You may also (if you want) perform convergence test of elastic constant vs value of strain. To my experience, values in the range of 0.3%--1% give good results.

Problem 2. How to choose the components of strain tensor?
That's simple. If you want to calculate C11, the strain tensor eij should be
e 0 0
0 0 0
0 0 0
Where e is the value of strain (0.01 typically) Such tensor can also be used for computing C12 and C13. For C22 it should be
0 0 0
0 e 0
0 0 0
And the same for C33. As for shear elastic constants (C44, C55 and C66) it has nondiagonal elements. For example for C66 it will be:
0 e 0
0 0 0
0 0 0
Then to calculate elastic constant you have to deform your lattice tensor (containing three lattice vectors) like this:
aij = a0ij(1+eij)
And then run the calculation with relaxation of atomic positions (VariableCell false). SIESTA gives the values of stress tensor. For example, you want to calculate C66. Only e12 is nonzero, so the expression for Sigma12 will be:
Sigma12 = C1212e12 = C66e
then C66 = Sigma12/e
This is in the ideal case assuming that the initial lattice was fully relaxed and the stress tensor was completely zero for relaxed cell. But it is not the case, yes?

Problem 2. I've relaxed the cell but the stress tensor is not zero. Does it influence the results?

Yes, but not much. You may improve the results if you provide “zero stress correction”. For example, for C66 elastic constant:
C66 = (Sigma12-Sigma012)/e,
Where Sigma0ij is the stress tensor of the relaxed cell. It should be zero but it doesn't. This correction will work only if Sigma012 << Sigma12! If not, I advice to increase e, so that it will work. Note also, if you use some geometry constraints, you will not be able to compute elastic constants or the error will be big!

Problem 3. When I perform the deformation of +1% and -1%, I obtain different values of stress tensor.
Yes, this happen. See Problem 2 for solution - introducing zero stress correction will improve the results (maybe:). I advice that you always do two calculations for e>0 and e<0. In ideal case the results will be the same, but in real world you'll get the difference. Taking the average value of two elastic constants and introducing zero stress correction will improve the accuracy.

The last but not least: even if you have a cubic system with C11, C12 and C44 independent elastic constants, compute all of them! Calculate C11, C22 etc. to reduce the error - if you have time to do so.

Hope this short information helps. Feel free to correct me if I'm wrong or ask questions in the comments (you don't need to register!)
 

Sunday, February 25, 2007

Compiling SIESTA in parallel with Portland compiler

Here I'll post my notes on SIESTA compilation on AMD Opteron machine using Portland compiler (pgf90/pgf95). However, you may also make use of them while doing your compilation with other compiler/architecture. I do not guarantee anything just that everything written below worked for me. However, this doesn't mean that it will work for you. More, if after following my instructions you hard disk will somehow get formatted, dont blame me - your using my advices at your own risk. I also have unfinished version of similarly notes of SIESTA compilation on Intel machine with ifort (an extended version of ones by Sebastien Le Roux) but I still don't have time to finish and publish them. If I do, I'll post the link here also. No instructions on compiling Mpich are given because I assume you're using cluster which has mpich already installed. If this is not the case, I refer you to Sebastien Le Roux HOW-TO which can be found in SIESTA mailing list archives.

What is needed to compile parallel SIESTA?

  • SIESTA distribution

  • Compartible Fortran and C compilers (Intel ifort and icc for intel machine, pgf95/pgcc for AMD etc)

  • MPI (Open MPI/ MPICH/LAM etc) – compiled with the same compiler which will be used for SIESTA compilation. Actually, it is already installed on clusters and supercomputers, so you'd check available MPI and the supporting compilers. If mpich2 and mpich are available, I'd rather recommend using mpich instead of mpich2 – I remember having some problems with the latter.

  • BLAS nad LAPACK – can be downloaded from netlib.org or (which is better) use ACML or Intel MKL instead. Also you may try Goto BLAS but I haven't manage to compile SIESTA with GOTO.

  • BLACS - should be compiled from scratch with the same options and the same compiler as SIESTA.

  • SCALAPACK – should be compiled from scratch after compilation of LAPACK/BLAS and BLACS.

Examples below will actually be related to Opteron Machine with Portland compiler (accessible as pgf95 for Fortran and pgcc for C), but the idea could be transferred to another compilers and architectures.

Option 1. Compiling BLAS and LAPACK from scratch

This works almost everywhere, but less efficient than using specific libraries optimized for specific processors (like ACML, Intel MKL or so). This option is described also in Sebastién Le Roux's HOW-TO, but for Intel machine. I repeat almost the same but for AMD machine which makes almost no difference. Also some things from Sebastién's HOWTO didn't work for me, so I'll mention them also.

To compile BLAS and LAPACK you only need a lapack.tar.gz file with LAPACK, so BLAS is included in this package downloadable from netlib.org.

BLAS AND LAPACK

Here is my make.inc (comments are skipped, my comments are written in italic):

PLAT = _LINUX

FORTRAN = pgf95 The compiler which will be used for SIESTA compilation. On Intel machine it could be ifort. Also you can use pgf90 instead.

OPTS = -g -O0 This indicates that we do not want optimization. Generally, the options should be the same as used for SIESTA compilation, but not necessary. You may want to change to -fastsse or -O2 or smth else available via man pgf95 command. I do not recommend to use -fast flag here, but you may try. Anyway never use it for SIESTA compile, better use -fastsse instead – I'll speak on it later.

DRVOPTS = $(OPTS)

NOOPT =

LOADER = pgf95 Generally, a loader should be the same as compiler indicated in FORTRAN section

LOADOPTS =

ARCH = ar

ARCHFLAGS= cr

RANLIB = ranlib

BLASLIB = ../../blas$(PLAT).a

LAPACKLIB = lapack$(PLAT).a

TMGLIB = tmglib$(PLAT).a

EIGSRCLIB = eigsrc$(PLAT).a

LINSRCLIB = linsrc$(PLAT).a

Variables not commented generally were not changed by me and left default. To compile type:

make blaslib

make lapacklib

After that the libraries .a will be found in the current direcrory. I suggest to rename them into libblas.a and liblapack.a and move somewhere to ~/siesta/lib directory (which you of course should create).

BLACS

Take some MPI file from BMAKES directory and use as a template for Bmake.inc. I'll post here only the lines which have to be modified:

BTOPdir = ~/BLACS Current directory, where BLACS was unpacked

COMMLIB = MPI

PLAT = LINUX

MPIdir = /opt/mpich/pgi62_64 Nota bene! Mpich should be compiled with the same compiler version and architecture as we will use! So you should do if you'll have to compile Mpich by yourself!

MPIdev = ch_p4

MPIplat = LINUX

MPILIBdir = $(MPIdir)/$(MPIdev)/lib Check attentively where your libmpich.a (or equivalent like libmpi.a) is located, this pparameter should indicated the directory where it is stored

MPIINCdir = $(MPIdir)/$(MPIdev)/include path to MPI include libs like mpi.h or so.

MPILIB = $(MPILIBdir)/libmpich.a full path to libmpich.a, check if it is correct and such a library exists

INTFACE = -Dadd_ I don't really know what this option is responsible for, but setting it to -Dadd_ worked for me all the times.

F77 = pgf95

F77NO_OPTFLAGS = -g -O0 Compiler flags for files which should not be compiled with optimization

F77FLAGS = $(F77NO_OPTFLAGS) This can and should be changed, I'm just doing testing compilations without any optimization. As for LAPACK, such flags should be (but not necessary) the same as for SIESTA compilation. Also, in my benchmark tests, the change of such flags in BLACS didn't influence the overall SIESTA performance at all. But it doesn't meant that you should not try to optimize BLACS at all.

F77LOADER = $(F77)

F77LOADFLAGS =

CC = pgcc C compiler, should be fully compartible with Fortran compiler. As for Portland it should be of the same architecture and version as pgf95. So is true for Intel compilers.

CCFLAGS = $(F77FLAGS) In general C compiler flags should dublicate the fortran flags, but you may play with them also.

CCLOADER = $(CC)

CCLOADFLAGS =

So to compile just type: make mpi.

NOTA BENE: Generally, a command make clean cleans the previous compilation, so you can compile from the very beginning if you want to. This should be done after every change in makefiles and before the following recompilation. But in case of BLACS you should type make mpi what=clean instead of make clean which won't work.

After compilation you'll find the compiled libraries in LIB subdirectory. I suggest also to move them into smth like ~/siesta/lib and rename to libblacsF77init.a, libblacsCinit.a, libblacs.a/

SCALAPACK

Here is my SLmake.inc (it's editable part, actually) with comments:

home = ~/scalapack-1.7.5 A path where SCALAPACK was unpacked

PLAT = LINUX

BLACSdir = ~/siesta/lib A directory, where BLACS libraries are located, compiled before.

USEMPI = -DUsingMpiBlacs

SMPLIB = /opt/mpich/pgi62_64/lib/libmpich.a Full path to mpich

I've set the wrong path to the BLACS libraries (three lines below), but it all still worked for me! So it seems SCALAPCK do not generally requires BLACS or it somehow finds the libraries automatically via BLACSdir.

BLACSFINIT = $(BLACSdir)/libblacsF77init.a

BLACSCINIT = $(BLACSdir)/libblacsCinit.a

BLACSLIB = $(BLACSdir)/libblacs.a

For compilers and options see the same comments for BLACS and LAPACK compilation

F77 = pgf95

CC = pgcc

NOOPT = -g -O0

F77FLAGS = $(NOOPT)

DRVOPTS = $(F77FLAGS)

CCFLAGS = $(NOOPT)

SRCFLAG =

F77LOADER = $(F77)

CCLOADER = $(CC)

F77LOADFLAGS =

CCLOADFLAGS =

CDEFS = -DAdd_ -DNO_IEEE $(USEMPI)

BLASLIB = ~/siesta/lib/libblas.a Path to previously compiled BLAS lib.

So after running make you'll find libscalapack.a which I would also copy to ~/siesta/lib or so.

At last: compiling SIESTA

So here is my arch.make :

SIESTA_ARCH=pgf95-natanzon

FC=pgf95

FFLAGS= -g -O0 We are compiling SIESTA without any optimization. You can (and I advise this) to play with optimization flags and not to stop at this options, because they are slow, but guarantee the succesfull compilation, which is more important at the time. You can also set this parameter to -fastsse or so (see man pgf95 for details). Problems were reported while using -fast flag, so I DO NOT recommend to use this flag – it may cause crashes in SIESTA runs.

FPPFLAGS= -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT

FFLAGS_DEBUG= -g -O0 This normally should not be changed – some SIESTA files should be compiled without optimization

RANLIB=echo

#

NETCDF_LIBS=

NETCDF_INTERFACE=

DEFS_CDF=

#

MPI_INTERFACE=libmpi_f90.a This also normally should not be changed

MPI_INCLUDE=/opt/mpich/pgi62_64/include path to MPI include dir

MPI_LIBS=/opt/mpich/pgi62_64/lib path, where libmpich.a is located

DEFS_MPI=-DMPI Option for compiling parallel version of SIESTA. Necessary!

#

HOME_LIB=~/siesta/lib Path where previuosly compiled by us libraries (BLAS, LAPACK, BLACS, SLALAPCK) are located

LAPACK_LIBS= -llapack There are two ways of indicating library path: describe full filename like liblapack.a or describe it in the way presented here. The linker will automatically replace -l by lib and add .a extension.

BLAS_LIBS=-lblas

BLACS_LIBS = lblacsCinit blacsF77init -lblacs

SCALAPACK_LIBS = -lscalapack

LIBS= -L$(HOME_LIB) $(SCALAPACK_LIBS) $(BLACS_LIBS) $(LAPACK_LIBS) $(BLAS_LIBS) $(NETCDF_LIBS)

SYS=bsd

DEFS= $(DEFS_CDF) $(DEFS_MPI)

#

.F.o:

$(FC) -c $(FFLAGS) $(INCFLAGS) $(DEFS) $<

.f.o:

$(FC) -c $(FFLAGS) $(INCFLAGS) $<

.F90.o:

$(FC) -c $(FFLAGS) $(INCFLAGS) $(DEFS) $<

.f90.o:

$(FC) -c $(FFLAGS) $(INCFLAGS) $<

Two compile SIESTA just type make. I expect everything should be ok.

What else? Adding optimization

This should be done from the very beginning but sometimes overoptimized SIESTA doesn't work and it is safe to compile without any optimization first and be sure that everything works. For this I recommend to run some SCF tasks (without cell relaxation, 0 CG moves) to be sure the parallelization works (benchmarks will be presented below).

To optimize you should replace F77FLAGS and CCFLAGS everywhere in the arch.make and supporting libraries and recompile everything from the beginning. So for pgf compiler I recommend to use this one:

F77FLAGS = -fastsse

I think this is the best can be done. On Intel machine -O3 -ttp7 (or so) flag should be used instead. It gives the most good optimization. Also try -fast flag on Intel machine – the compiler should automatically discover the best optimization flags, but it does smth wrong for PGF. So never use it with pgf, but this is, say, N-th time I'm repeating this. Also some additional changes should be made into arch.make :

atom.o: This file and the following should be still compiled without optimization

$(FC) -c $(FFLAGS_DEBUG) atom.f

#

dfscf.o:

$(FC) -c $(FFLAGS_DEBUG) $(INCFLAGS) dfscf.f

#

#

electrostatic.o:

$(FC) -c $(FFLAGS_DEBUG) electrostatic.f

#

.F.o:

$(FC) -c $(FFLAGS) $(INCFLAGS) $(DEFS) $<

.f.o:

$(FC) -c $(FFLAGS) $(INCFLAGS) $<

.F90.o:

$(FC) -c $(FFLAGS) $(INCFLAGS) $(DEFS) $<

.f90.o:

$(FC) -c $(FFLAGS) $(INCFLAGS) $<


Option 2. Linking SIESTA with external libraries (ACML, Intel MKL)

What for? My benchmarks show, that using ACML library on AMD64 machine gradually increases the overall SIESTA performance, so it's worth thinking of linking external libraries like ACML, Intel MKL, GotoBLAS etc.

Note: I haven't manage to link Intel MKL by now, however it doesn't mean this can't be done. Keep trying. So, I just share my experience of linking ACML here. I think another libraries can be linked in the same way.

What is needed?

  • ACML library installed. If you don't have one, just download, unpack and run install-blablabla.sh script, which will do everythin automatically, you only have to provide a path where to install. I suppose my ACML to be installed in ~/acml.

  • Precompiled BLACS according to instructions in Option 1 chapter.

  • Sources of SCALAPACK and of course SIESTA, those two should be recompiled with ACML support.

  • No LAPACK/BLAS are needed at this time, because ACML will replace them.

SCALAPACK

Compared to the previous chapter, only one change should be made into Slmake.inc file, which shows the path to BLAS:

BLASLIB = ~/acml/pgi64/lib/libacml.a A full path to libacml.a Be aware to provide the correct vesrion (32 or 64 bit) compartible with compiler.

SIESTA

Here I'll give you my arch.make file again, but the comments will be given only for lines which have changed.

SIESTA_ARCH=pgf95-matterhorn

#

FC=pgf95

FC_ASIS=$(FC)

#

FFLAGS= -fastsse -Wl,-R/opt/pgi/linux86-64/6.2/libso/ -Wl,-R~/acml/pgi64/lib/

You see additional flags are being passed to compiler. -fastsse indicates optimization, while two other paths tell using ACML and PGI libraries which both are necessary for correct linking

FFLAGS_DEBUG= -g O0

LDFLAGS= -fastsse -Bstatic The second flags tells to statically link the libraries so you won't have troubles with indicating LD_LIBRARY_PATH before SIESTA run. If you have troubles with linking, try to delete -Bstatic flag and recompile. But in this case you should execute

export LD_LIBRARY_PATH=/opt/pgi/linux86-64/6.2/libso

before SIESTA run or otherwise you'll get errors like: Cannot load shared libraries XXX.so

Note, that -Bstatic flag is called -static for ifort. Also the linking can crash for Intel Compiler with such a flag, but for pgf it works fine.

RANLIB=echo

COMP_LIBS= #dc_lapack.a Do not use internal LAPACK of SIESTA!

MPI_ROOT=/opt/mpich/pgi62_64/ch_p4

MPI_LIBS= -L$(MPI_ROOT)/lib -lmpich

ACML = -mp -L~/acml/pgi64/lib -lacml -Wl,-R/opt/pgi/linux86-64/6.2/libso/ -Wl,-R~/acml/pgi64/lib/

First parameter is a path to ACML and two others simply repeat the compiler flags. I don't know what -mp is for.

MPI_INTERFACE=libmpi_f90.a

MPI_INCLUDE=/opt/mpich/pgi62_64/ch_p4/include

DEFS_MPI=-DMPI

#

SCALAPACK = -L~/siesta/lib -lscalapack

BLACS = -L~/siesta/lib -lblacs -lblacsF77init -lblacsCinit -lblacsF77init -lblacs

NUMA = -L/opt/pgi/linux86-64/6.2/libso -lnuma This library is needed for static linking. If you link dynamically (no -Bstatic flag) you do not need this library.

LIBS= $(SCALAPACK) $(BLACS) $(ACML) $(MPI_LIBS) $(NUMA)

SYS=cpu_time

DEFS= $(DEFS_CDF) $(DEFS_MPI)

#

#

# Important (at least for V5.0-1 of the pgf90 compiler...)

# Compile atom.f and electrostatic.f without optimization.

#

atom.o:

$(FC) -c $(FFLAGS_DEBUG) atom.f

#

dfscf.o:

$(FC) -c $(FFLAGS_DEBUG) $(INCFLAGS) dfscf.f

#

#

electrostatic.o:

$(FC) -c $(FFLAGS_DEBUG) electrostatic.f

#

.F.o:

$(FC) -c $(FFLAGS) $(INCFLAGS) $(DEFS) $<

.f.o:

$(FC) -c $(FFLAGS) $(INCFLAGS) $<

.F90.o:

$(FC) -c $(FFLAGS) $(INCFLAGS) $(DEFS) $<

.f90.o:

$(FC) -c $(FFLAGS) $(INCFLAGS) $<

So, voila! That's all I can tell you on parallel SIESTA compilation. I've also done some benchmarks, but I decided not to include them here. Just ask me in the comments, if you want a pdf-file of these notes with benchmark tests on parallelization.

Sunday, February 4, 2007

How to calculate Density of States with SIESTA

Here I'll shortly explain how to calculate density of states with SIESTA.
For this you need to run your calculations and finally get a file SystemLabel.EIG in the working directory. Take a look at this file. This is how a typical output looks like:
       -3.4561                                                                                                                          
196     1 40                                                                                                                                          
    1   -52.65773   -52.61839   -52.61035   -52.59173   -31.48590   -31.40974   -31.38261   -31.22230   -31.10230   -31.05041                               
        -30.99880   -30.97822   -30.88169   -30.85002   -30.80920   -30.74749   -22.38872   -21.66313   -21.52788   -21.47900                               
        -21.10694   -20.81325   -20.77569   -20.74793   -10.80684   -10.31129   -10.14217   -10.05258    -9.95418    -9.79580                               
         -8.93664    -8.90326    -8.39606    -8.29401    -8.20815    -7.69086    -7.60359    -7.48728   
           ....                                                           
    2   -52.66887   -52.60975   -52.60674   -52.59284   -31.51384   -31.50220   -31.45885   -31.18657   -31.02913   -30.99967                               
        -30.99402   -30.96087   -30.86182   -30.84823   -30.79439   -30.72731   -22.64175   -21.65435   -21.63041   -21.51864                               
           ....                                                         
    3   -52.67693   -52.60817   -52.59919   -52.59389   -31.58918   -31.51596   -31.50736   -31.14978   -31.05660   -31.00531        
          ....
The highlighted numbers are very important, so here is the explanation of their meaning:
-3.4561 eV is a Fermi energy of the system, which has 196 bands, 1 spin component (so it is unpolarized) and 40 k-points each of them is also highlighted. After Fermi energy please add the following numbers:
eta  - I'd call it precision, put it less than 1.
Ne - number of eigenvalues, I advise to put it large.
Emin and Emax - interval of energies for which DOS will be calculated (eV).

Put this values exactly on the first line after the Fermi energy. I did it in this way:
-3.4561 0.05 1000 -10 10

Close the file and run eig2dos utility:
    eig2dos < SystemLabel.EIG | tee dos.dat
Eig2dos utility is located in Utils directory of SIESTA. You can compile it using f95 compiler or another one:
    f95 eig2dos.f
You may plot the results using gnuplot or grace program:
    xmgrace dos.dat

Friday, January 19, 2007

Notes on using PBS systems on clusters and supercomputers

This shortnotes may help you (and me after I'll forget it :) ) to make use of large supercomputers with PBS systems installed. I'll tell how to run, monitor and delete tasks using PBS systems. At the moment I'm using the computer located at ACK CYFRONET AGH in Kraków, where Torgue PBS is installed, and a cluster with OpenPBS located at Interdisciplinary Center for Computer Modelling of Warsaw University.  Generalizing a (short) experience of maintaining PBS, I'll tell you how to put tasks in a queue, monitor them and delete;  how to change resources required for the task; and a sample of typical PBS-skript will be provided here. It is supposed that you already know basics of Unix/Linux and have an account at the computer with PBS system installed.


Portable Batch System is a queue system which allows an optimal maintaining of computer resources (memory, CPU etc) on supercomputers and clusters, which are used by many users for large calculations. Most big computers consist of several nodes, each of which has several processor  units. Normally one runs a command from a workstation or terminal just by typing a command name and waits until it is finished.  This command will use 99% CPU time and will run on one processor unit. If one needs to run two commands at the same time, one opens two terminal windows and runs two commands, each of them will use 50% CPU. Well, when we want to make use of parallel calculations on several nodes and CPUs, a special software is needed, actually this is what PBS does.

It works like a cron daemon or Task Scheduler in Windows. You specify a program to be runned, resources to be used (i.e. number of nodes and processors, amount of memory etc.) and put it into a queue with the following command:

qsub script.sh

where script.sh - is a shell script which contains a path to a program and resources to be used. I'll post example of such a script below.
You may also specify resources in the command line. For details address:

man qsub
man pbs_resources

Sometimes you have to specify a name of the node:

qsub -q node_name

For details address a documentation provided by system administrator. After putting a queue you'll get an output:

98982.supercomp

Here 98982 is job identifier, "supercomp" is a node hostname. You can run the following command to see the status of this job:

qstat -f 98982

To see the status of all jobs just type

qstat -a

Or you may want to see only your own jobs:

qstat -n your_loginname

In order to delete the job with id=98982 you may type:

qdel 98982

In order to change the resources given in the script run.sh, you may use the command qalter. I will not discuss all the possible options of the commands qsub, qalter, qdel and qstat, because you can find them typing man command_name. I'll give you minimum skills required for this and recommend to read manuals and literature for further details. Let us look inside of the script run.sh:

#!/bin/sh
#PBS -N WaterStone
#PBS -l nodes=1:ppn=8
#PBS -l walltime=72:00:00
cd $PBS_O_WORKDIR/
~/siesta/Src/siesta < input.fdf > screen.out

This is usual Unix shell script, where special commands recognized by PBS system are put in the comments and begin with #PBS.  On the first line the name of the task is specified (WaterStone) which will appear on statistics (qstat). Second line denotes the number of notes (1) and the number of processors (8). You should address the documentation provided by system administrator if you want to know how to set these parameters. For example, on computers which I'm using presently I have to set nodes=1 always, because I have an access to only one node, and the maximum number of processors is restricted to 8 for my account. Note also, that on most servers tasks which use less resources (less memory and fewer processors) have priority to run, so sometimes it's useful to set ppn=2 or ppn=1 because such a task will start earlier than the task with ppn=8. Of course, this depends on the policy of system administrators.

The fourth line describes the amount of time needed for the task. After this period of time the task will automatically terminated, so if you need more time for this task you should change this paramater with qalter command later.

The fifth line forces to use the directory where script run.sh is located as working directory, where all the program output will be stored, as well as files WaterStone.task_id_o and WaterStone.task_id_e. In the sooner file all "screen" output will be stored, and possible errors will be stored in the latter file. I strongly recommend neighther use spaces in the name of working directory nor create a working directory in a folder which contains spaces in its name or path! This may cause problems, for example, the job cannot be started.

The last line is a command to run with input read from input.fdf and output stored in screen.out. Note, in such a case no output will be written to WaterStone.task_id_o because in such a command no output is put on the "screen". WaterStone.task_id_o file is created only after the task is finished, but in contrast output to screen.out is stored all the time, so I recommend you to redirect all the output to the file like I did it, so you'll be able to monitor it all the time.

Of course, there are many other options which can be defined in the script. For example, you may specify amount of memory for the program:

#PBS -l mem=1gb

Or force the PBS system to notify you about the status of the job:

#PBS -M noddeat@leavejournal.com

Not all the other options may work, this depends on the settings of the particular system.

NOTE: This article is a draft. I'll add more information here as soon as I have time. You may ask questions or share your own experience on PBS here in the comments.

Sorry for bad English, you may correct this article and post a corrected version in the comments. I also suppose that you post comments in English only because I may share a link to this article within English-speaking scientific community.