4.4 Advanced usage

Next: 5 Performances Up: 4 Using CP Previous: 4.3 CP dynamics Contents

Subsections

4.4 Advanced usage

4.4.1 Self-interaction Correction

The self-interaction correction (SIC) included in the CP package is based on the Constrained Local-Spin-Density approach proposed my F. Mauri and coworkers (M. D'Avezac et al. PRB 71, 205210 (2005)). It was used for the first time in QUANTUM ESPRESSO by F. Baletto, C. Cavazzoni and S.Scandolo (PRL 95, 176801 (2005)).

This approach is a simple and nice way to treat ONE, and only one, excess charge. It is moreover necessary to check a priori that the spin-up and spin-down eigenvalues are not too different, for the corresponding neutral system, working in the Local-Spin-Density Approximation (setting nspin = 2). If these two conditions are satisfied and you are interest in charged systems, you can apply the SIC. This approach is a on-the-fly method to correct the self-interaction with the excess charge with itself.

Briefly, both the Hartree and the XC part have been corrected to avoid the interaction of the excess charge with itself.

For example, for the Boron atoms, where we have an even number of electrons (valence electrons = 3), the parameters for working with the SIC are:

           &system
           nbnd= 2,
           tot_magnetization=1,
           sic_alpha = 1.d0,
           sic_epsilon = 1.0d0,
           sic = 'sic_mac',
           force_pairing = .true.,

The two main parameters are:

force_pairing = .true., which forces the paired electrons to be the same;
sic='sic_mac', which instructs the code to use Mauri's correction.

Warning: This approach has known problems for dissociation mechanism driven by excess electrons.

Comment 1: Two parameters, sic_alpha and sic_epsilon', have been introduced following the suggestion of M. Sprik (ICR(05)) to treat the radical (OH)-H₂O. In any case, a complete ab-initio approach is followed using sic_alpha=1, sic_epsilon=1.

Comment 2: When you apply this SIC scheme to a molecule or to an atom, which are neutral, remember to add the correction to the energy level as proposed by Landau: in a neutral system, subtracting the self-interaction, the unpaired electron feels a charged system, even if using a compensating positive background. For a cubic box, the correction term due to the Madelung energy is approx. given by 1.4186/L_box -1.047/(L_box)³, where L_box is the linear dimension of your box (=celldm(1)). The Madelung coefficient is taken from I. Dabo et al. PRB 77, 115139 (2007). (info by F. Baletto, francesca.baletto@kcl.ac.uk)

4.4.2 ensemble-DFT

The ensemble-DFT (eDFT) is a robust method to simulate the metals in the framework of ''ab-initio'' molecular dynamics. It was introduced in 1997 by Marzari et al.

The specific subroutines for the eDFT are in CPV/src/ensemble_dft.f90 where you define all the quantities of interest. The subroutine CPV/src/inner_loop_cold.f90 called by cg_sub.f90, control the inner loop, and so the minimization of the free energy A with respect to the occupation matrix.

To select a eDFT calculations, the user has to set:

            calculation = 'cp'
            occupations= 'ensemble' 
            tcg = .true.
            passop= 0.3
            maxiter = 250

to use the CG procedure. In the eDFT it is also the outer loop, where the energy is minimized with respect to the wavefunction keeping fixed the occupation matrix. While the specific parameters for the inner loop. Since eDFT was born to treat metals, keep in mind that we want to describe the broadening of the occupations around the Fermi energy. Below the new parameters in the electrons list, are listed.

smearing: used to select the occupation distribution; there are two options: Fermi-Dirac smearing='fd', cold-smearing smearing='cs' (recommended)
degauss: is the electronic temperature; it controls the broadening of the occupation numbers around the Fermi energy.
ninner: is the number of iterative cycles in the inner loop, done to minimize the free energy A with respect the occupation numbers. The typical range is 2-8.
conv_thr: is the threshold value to stop the search of the 'minimum' free energy.
niter_cold_restart: controls the frequency at which a full iterative inner cycle is done. It is in the range 1 ÷ninner. It is a trick to speed up the calculation.
lambda_cold: is the length step along the search line for the best value for A, when the iterative cycle is not performed. The value is close to 0.03, smaller for large and complicated metallic systems.

NOTE: degauss is in Hartree, while in PWscfis in Ry (!!!). The typical range is 0.01-0.02 Ha.

The input for an Al surface is:

            &CONTROL
             calculation = 'cp',
             restart_mode = 'from_scratch',
             nstep  = 10,
             iprint = 5,
             isave  = 5,
             dt    = 125.0d0,
             prefix = 'Aluminum_surface',
             pseudo_dir = '~/UPF/',
             outdir = '/scratch/'
             ndr=50
             ndw=51
            /
            &SYSTEM
             ibrav=  14,
             celldm(1)= 21.694d0, celldm(2)= 1.00D0, celldm(3)= 2.121D0,
             celldm(4)= 0.0d0,   celldm(5)= 0.0d0, celldm(6)= 0.0d0,
             nat= 96,
             ntyp= 1,
             nspin=1,
             ecutwfc= 15,
             nbnd=160,
             input_dft = 'pbe'
             occupations= 'ensemble',
             smearing='cs',
             degauss=0.018,
            /
            &ELECTRONS
             orthogonalization = 'Gram-Schmidt',
             startingwfc = 'random',
             ampre = 0.02,
             tcg = .true.,
             passop= 0.3,
             maxiter = 250,
             emass_cutoff = 3.00,
             conv_thr=1.d-6
             n_inner = 2,
             lambda_cold = 0.03,
             niter_cold_restart = 2,
            /
            &IONS
             ion_dynamics  = 'verlet',
             ion_temperature = 'nose'
             fnosep = 4.0d0,
             tempw = 500.d0
            /
            ATOMIC_SPECIES
             Al 26.89 Al.pbe.UPF

NOTA1 remember that the time step is to integrate the ionic dynamics, so you can choose something in the range of 1-5 fs.
NOTA2 with eDFT you are simulating metals or systems for which the occupation number is also fractional, so the number of band, nbnd, has to be chosen such as to have some empty states. As a rule of thumb, start with an initial occupation number of about 1.6-1.8 (the more bands you consider, the more the calculation is accurate, but it also takes longer. The CPU time scales almost linearly with the number of bands.)
NOTA3 the parameter emass_cutoff is used in the preconditioning and it has a completely different meaning with respect to plain CP. It ranges between 4 and 7.

All the other parameters have the same meaning in the usual CP input, and they are discussed above.

4.4.3 Free-energy surface calculations

Once CP is patched with PLUMED plug-in, it becomes possible to turn-on most of the PLUMED functionalities running CP as: ./cp.x -plumed plus the other usual CP arguments. The PLUMED input file has to be located in the specified outdir with the fixed name plumed.dat.

4.4.4 Treatment of USPPs

The cutoff ecutrho defines the resolution on the real space FFT mesh (as expressed by nr1, nr2 and nr3, that the code left on its own sets automatically). In the USPP case we refer to this mesh as the "hard" mesh, since it is denser than the smooth mesh that is needed to represent the square of the non-norm-conserving wavefunctions.

On this "hard", fine-spaced mesh, you need to determine the size of the cube that will encompass the largest of the augmentation charges - this is what nr1b, nr2b, nr3b are. hey are independent of the system size, but dependent on the size of the augmentation charge (an atomic property that doesn't vary that much for different systems) and on the real-space resolution needed by augmentation charges (rule of thumb: ecutrho is between 6 and 12 times ecutwfc).

The small boxes should be set as small as possible, but large enough to contain the core of the largest element in your system. The formula for estimating the box size is quite simple:

nr1b = 2R_c/L_x x nr1

and the like, where R_cut is largest cut-off radius among the various atom types present in the system, L_x is the physical length of your box along the x axis. You have to round your result to the nearest larger integer. In practice, nr1b etc. are often in the region of 20-24-28; testing seems again a necessity.

The core charge is in principle finite only at the core region (as defined by some R_rcut ) and vanishes out side the core. Numerically the charge is represented in a Fourier series which may give rise to small charge oscillations outside the core and even to negative charge density, but only if the cut-off is too low. Having these small boxes removes the charge oscillations problem (at least outside the box) and also offers some numerical advantages in going to higher cut-offs." (info by Nicola Marzari)

4.4.5 Hybrid functional calculations using maximally localized Wannier functions

In this section, we illustrate some guidelines to perform exact exchange (EXX) calculations using Wannier functions efficiently.

The references for this algorithm are:

(i) Theory: X. Wu , A. Selloni, and R. Car, Phys. Rev. B 79, 085102 (2009).

(ii) Implementation: H.-Y. Ko, B. Santra, R. A. DiStasio, L. Kong, Z. Li, X. Wu, and R. Car, arxiv.

The parallelization scheme in this algorithm is based upon the number of electronic states. In the current implementation, there are certain restrictions on the choice of the number of MPI tasks. Also slightly different algorithms are employed depending on whether the number of MPI tasks used in the calculation are greater or less than the number of electronic states. We highly recommend users to follow the notes below. This algorithm can be used most efficiently if the numbers of electronic states are uniformly distributed over the number of MPI tasks. For a system having N electronic states the optimum numbers of MPI tasks (nproc) are the following:

(a) In case of nproc $\leq$ N, the optimum choices are N/m, where m is any positive integer.

Robustness: Can be used for odd and even number of electronic states.

OpenMP threads: Can be used.

Taskgroup: Only the default value of the task group (-ntg 1) is allowed.

(b) In case of nproc > N, the optimum choices are N*m, where m is any positive integer.

Robustness: Can be used for even number of electronic states.

Largest value of m: As long as nj_max (see output) is greater than 1, however beyond m=8 the scaling may become poor. The scaling should be tested by users.

OpenMP threads: Can be used and highly recommended. We have tested number of threads starting from 2 up to 64. More threads are also allowed. For very large calculations (nproc > 1000 ) efficiency can largely depend on the computer architecture and the balance between the MPI tasks and the OpenMP threads. User should test for an optimal balance. Reasonably good scaling can be achieved by using m=6-8 and OpenMP threads=2-16.

Taskgroup: Can be greater than 1 and users should choose the largest possible value for ntg. To estimate ntg, find the value of nr3x in the output and compute nproc/nr3x and take the integer value. We have tested the value of ntg as 2^m, where m is any positive integer. Other values of ntg should be used with caution.

Ndiag: Use -ndiag X option in the execution of cp.x. Without this option jobs may crash on certain architectures. Set X to any perfect square number which is equal to or less than N.

DEBUG: The EXX calculations always work when number of MPI tasks = number of electronic states. In case of any uncertainty, the EXX energy computed using different numbers of MPI tasks can be checked by performing test calculations using number of MPI tasks = number of electronic states.

An example input is listed as following:

&CONTROL
  calculation       = 'cp-wf',
  title             = "(H2O)32 Molecule: electron minimization PBE0",
  restart_mode      = "from_scratch",
  pseudo_dir        = './',
  outdir            = './',
  prefix            = "water",
  nstep             = 220,
  iprint            = 100,
  isave             = 100,
  dt                = 4.D0,
  ekin_conv_thr     = 1.D-5,
  etot_conv_thr     = 1.D-5,
/
&SYSTEM
  ibrav             = 1,
  celldm(1)         = 18.6655, 
  nat               = 96,
  ntyp              = 2,
  ecutwfc           = 85.D0,
  input_dft         = 'pbe0',
/
&ELECTRONS
  emass             = 400.D0,
  emass_cutoff      = 3.D0,
  ortho_eps         = 1.D-8,
  ortho_max         = 300,
  electron_dynamics = "damp",
  electron_damping  = 0.1D0,
/
&IONS
  ion_dynamics      = "none", 
/
&WANNIER
  nit               = 60,
  calwf             = 3,
  tolw              = 1.D-6,
  nsteps            = 20,
  adapt             = .FALSE.
  wfdt              = 4.D0,
  wf_q              = 500,
  wf_friction       = 0.3D0,
  exx_neigh         = 60,     ! exx related optional
  exx_dis_cutoff    = 8.0D0,  ! exx related optional
  exx_ps_rcut_self  = 6.0D0,  ! exx related optional
  exx_ps_rcut_pair  = 5.0D0,  ! exx related optional
  exx_me_rcut_self  = 9.3D0,  ! exx related optional
  exx_me_rcut_pair  = 7.0D0,  ! exx related optional
  exx_poisson_eps   = 1.D-6,  ! exx related optional
/
ATOMIC_SPECIES
O 16.0D0 O_HSCV_PBE-1.0.UPF
H  2.0D0 H_HSCV_PBE-1.0.UPF

Next: 5 Performances Up: 4 Using CP Previous: 4.3 CP dynamics Contents

(i)	Theory: X. Wu , A. Selloni, and R. Car, Phys. Rev. B 79, 085102 (2009).
(ii)	Implementation: H.-Y. Ko, B. Santra, R. A. DiStasio, L. Kong, Z. Li, X. Wu, and R. Car, arxiv.