**Introduction to
Traditional**

INTRODUCTION

Optimization of the structures of large
molecules by quantum chemical methods requires stable, efficient, and
therefore, elaborate optimization algorithms^{1a,b},
usually in the framework of an over-complete, redundant internal
coordinate system^{2,3c}.
Since the derivatives of the energy are calculated with respect to
the Cartesian coordinates^{4}
of the nuclei, they must be transformed into the internal coordinate
system^{5}
to carry out the optimization process.

Quasi-Newton algorithms are very efficient for
finding minima, particularly when inexpensive gradients (first
derivatives) are available. These methods employ a quadratic model of
the potential energy surface to take a series of steps that converges
to a minimum. The optimization is generally started with an
approximate Hessian (second derivative matrix), which is updated at
each step using the computed gradients. The stability and rate of
convergence of quasi-Newton methods can be improved by controlling
the step size, using methods such as rational function
optimization^{6}
(RFO) or the trust radius model^{1,7}
(TRM). In this chapter we briefly review various aspects of current
optimization methods^{1,7}:

a) the quadratic approximation to the potential energy (hyper)surface (PES),

b)
the use of a complete (and usually
redundant) set of internal coordinates^{2,3},

c) BFGS update for the approximate second derivative (Hessian) matrix,

d)
*rational function optimization*^{6}
(RFO) and *trust radius model* (TRM)^{1,7},

e)
*geometry optimization using direct
inversion in the iterative subspace*^{8} (GDIIS)

The quadratic approximation

Quadratic hyper-surfaces, like paraboloids,
saddle surfaces, etc. can provide satisfactory description for
studying the local properties of molecular potential energy surfaces
(PESs) around their critical points. Gradient optimization
methods^{1,7}
are based on a quadratic approximation to the PES:

where *E*^{*} is the predicted
energy at a step **s** from the current point, *E* and **f**
are the energy and force (negative of the gradient) calculated at the
current point and **H** is the (approximate) Hessian which is
updated during the optimization. The related linear approximation to
the forces gives the prediction for the force, **f**^{*},
at a step **s** from the current point:

At a stationary point (e.g. minimum, transition state, etc.), the forces vanish. Within the quadratic approximation, the step to the stationary point is:

Equation [3]
is the definition of the well-known Newton-Raphson (NR) or
quasi-Newton optimization step, **s*** ^{NR}*, which
is repeated until the optimization converges. If the quadratic
approximation described the PES accurately and the exact Hessian were
used, then eq. [3]
would reach the stationary point in one step. However, the PES is not
quadratic and the actual Hessian provides a good approximation to the
PES only in the vicinity of the current point, which may be far from
a stationary point (it may not even have the correct structure for
the desired optimization - zero negative eigenvalues for
minimization, one negative eigenvalue for transition structure
searches). Furthermore, the calculation of the Hessian can be very
expensive, specially for large systems. Consequently, a quasi-Newton
optimization usually starts with an approximate Hessian matrix that
has the required structure, and then improves the Hessian with an
updating procedure during the course of the optimization.

Regular coordinate transformations ( O(N^{3})
)

The performance of the optimization process depends highly on the
choice of the coordinate system. In most cases the optimization is
carried out in a complete internal coordinate system using the
derivatives computed with respect to Cartesian coordinates. The use
of internal coordinates for geometry optimization necessitates the
transformation of the forces or gradients, calculated in Cartesian
coordinates, into internal coordinates and the back-transformation of
the internal coordinate optimization step into Cartesians. Usually
both transformations are carried out with the Wilson B-matrix^{9}
used in vibrational spectroscopy. The connection between the
Cartesian and internal coordinate spaces is given by

where *d***q*** _{int.}*
and

The forces are the negative of the potential energy gradients.
Since the Wilson B-matrix is not square, the transformation of the
Cartesian force vector, **f*** _{x}*,
into the internal coordinate space is usually written in terms of the
inverse of the

A generalized inverse should be used, since **G** is singular
if the internal coordinate system contains redundancy^{2,3c}.
The generalized inverse can be calculated via diagonalization, which
requires a number of O(N^{3})
operations.

The forces in internal coordinates are used to update the Hessian
and to predict the change in geometry using either a Newton-Raphson
or rational function optimization (RFO)^{6} step. The
transformation of the optimization step in internal coordinates
(D**q*** _{int.}*)
back to Cartesian coordinates (D

In practice, the inverse of **G** is typically formed only once
and is used to iterate eq. [7] until *d***q*** _{int.}*
and/or

Hessian update

For minimizations within the quadratic model,
the Hessian must be positive definite, for which there are a number
of suitable updating methods^{7}.
One of the most successful schemes is the BFGS Hessian update^{10}:

However, the BFGS update converges to the
actual Hessian only if accurate line searches are used^{7d}.
The steps in typical molecular geometry optimizations usually do not
meet this criterion and, therefore, inaccuracies in the Hessian may
lead to reduced efficiency near the end of an optimization. A
possible remedy for this behavior is to occasionally restart the
update^{7}.
Alternatively, the symmetric rank one (SR1) update (also known as the
Murtagh-Sargent, update)^{11},

converges to the correct Hessian on a quadratic surface without exact line searches. However, this update is unsuitable if the denominator becomes too small.

For transition state searches, the update should not force the
Hessian to be positive definite. The SR1 update is appropriate, as is
the Powell-symmetric-Broyden^{12}
(PSB) update:

Bofill developed an improved update^{13}
by combining PSB with SR1 in a manner that avoids the problem with
the denominator of the latter:

*Trust radius model*^{7}
(TRM) and *rational function optimization*^{6}
(RFO)

A simple quasi-Newton optimization based on a quadratic model with updated Hessians and Newton-Raphson steps can readily encounter difficulties. Even when the Hessian has the right structure (e.g. positive definite for minimizations), care must be taken not to step too far, since the quadratic model is accurate for only a small region of the potential energy surface. Both RFO and TRM can be regarded as methods for controlling the step size and have proven to be valuable components of optimization algorithms. TRM and RFO based minimization algorithms can be regarded as Newton-Raphson methods with a corrected Hessian:

where l
is a non-negative scalar, **S** is usually simply a constant
scalar times the unit matrix, x**I**,
and **s**^{l}
is the resulting corrected step.

*Trust radius model (TRM)*. The goal of
TRM in minimization is to find the lowest energy point within a
suitable trust radius, in the framework of the quadratic
approximation. This condition is equivalent to finding a step, **s**^{l},
such that it is parallel to predicted force, **f**^{l},
points in the same direction (l
³ 0) and
has a length no greater than the trust radius, t:

and

Note that eq. [15]
is equivalent to eq. [13]
if **S** is replaced by the unit matrix. Equations [14-16]
have only one non-negative solution for l
when the Hessian is positive definite (l
is zero if the length of the Newton-Raphson step is less than the
trust radius). If the Hessian has** **one or more negative
eigenvalues, the only acceptable solution for minimization is always
larger than the negative of the lowest eigenvalue, e* _{lowest}*.
Combining these conditions yields:

The TRM method usually results in more efficient step size control than a simple scaling to the Newton-Raphson step.

*Rational function optimization (RFO)*. In
the RFO approach for minimization, the quadratic model is modified so
that a suitably controlled step toward the minimum is obtained. By
including a step size dependent scaling denominator, the RFO method
contains a self-consistent trust radius:

where **S** is a symmetric, usually diagonal
scaling matrix. The RFO approximation for the PES can be expressed
more compactly in terms of the augmented Hessian:

The expression of the force vector in terms of the RFO model is:

For the RFO step, **s*** ^{RFO}*,
which satisfies the stationary condition,

where l
= **f ^{T}s**

Equation [21]
is in implicit form; therefore, the solution can only be obtained by
iterating until l
converges. For minimizations, l
should obey condition [17]
to ensure the positive definiteness of **H** + l**S**.
Constraints on l
for transition state optimizations are discussed elsewhere.^{6b.}
Some important features of the RFO approach for minimization are:

a) smooth convergence to the quadratic approximation in the vicinity of critical points,

b) automatic scaling down of displacements corresponding to small eigenvalues of a positive definite Hessian,

c) scaling up of displacement(s) corresponding to negative eigenvalues of the Hessian and directing the step toward a minimum,

d) avoiding problems when the Hessian is nearly singular, such as in the vicinity of inflection points.

*Geometry optimization using direct inversion in the iterative
subspace*^{8}
(GDIIS) method.

The quadratic potential energy surfaces (PESs) have some
interesting features, which are used for *multi-dimensional search*
(MDS) methods, like GDIIS. The linear search can use data (energy and
derivatives) computed only at two points but more efficiently than
the quadratic approximation (e.g. by fitting a quartic polynomial).
The current MDS methods, like GDIIS, are based on the quadratic
approximation and using only the previous forces and geometry points
with or without an approximate Hessian. We assume in the following
discussion that the forces are available for *k* (*k *> 1)
earlier points on a quadratic PES, described by eq. [1 and 2]. The *k*
geometry points (**q _{i}**, i = 1..

where the c_{i} coefficients
are defined as:

and therefore

The force (**f**^{*}) at
point **q**^{*} can also be
expressed using the same coefficients:

Similar expressions can be constructed for any data if their changes have linear connection to the changes in coordinates, like the first derivatives of a quadratic PES. The reversed statements of the previous discussion are also valid:

a)
When *k* points are given in a
space, any linear combination of the related position vectors, if the
sum of the coefficients is 1 (condition [24]), will lead to a point
which is in the subspace spanned by the given points (eq. [22]).

b) Any data, if the data change has linear connection to the changes of the coordinates, can be expressed as the same linear combination of the corresponding data at the given points. (See eq. [25])

The goal of the GDIIS method is to find the coefficients which
satisfy condition [24] and minimizes the predicted data with a given
measure, like the absolute value in the case of the forces. The GDIIS
based optimization methods in quantum chemistry use more efficient
*error vectors* instead of the gradients or forces to benefit
from the information stored in the approximate Hessian. The original
GDIIS technique uses the Newton-Raphson (NR) steps (eq. [3]) as error
vectors calculated for the stored geometry points:

The appropriate linear combination of these error vectors leads to the prediction of the NR step from the corresponding point:

The task is to find proper c_{i}
coefficients which lead to the minimal length error vector, the
*residuum vector*, **r**, and hold condition [24]:

_{} is
minimal
and
[29EGDRes3]

The solution of this problem can be reached using the method of least squares by solving the following system of linear equations:

where l
is the Lagrangian multiplier and matrix **A** is the overlap
matrix of the error vectors,
_{
}.

The regular GDIIS algorithm begins
with a Newton-Raphson step [2].
Subsequently the method produces the DIIS coefficients solving the
system of equations [31]
using the set of estimated error vectors calculated from the earlier
(*k*) force vectors [26]
and inverting the DIIS matrix. Applying the given DIIS coefficients
to the geometry vectors and error vectors, we get an intermediate
point (**q**^{*}) and the residuum vector respectively.
With the current definition of error vectors the step by the residuum
vector from the intermediate point gives the new geometry point

The new geometry point can be
regarded as the same linear combination of the predicted geometry
points (_{})
produced by Newton-Raphson steps from the corresponding points.

References

1
(a) H. B. Schlegel "Geometry optimization on potential energy
surfaces" in *Modern Electronic Structure Theory* D. R..
Yarkony, ed., (World Scientific Publishing, 459-500, 1995)

(b) H. B. Schlegel "Geometry optimization" in *Encyclopedia
of Computational Chemistry* (Wiley, New York, 1998)

2
C. Peng, P. Y. Ayala, H. B. Schlegel and M. J. Frisch, J. Comp. Chem.
**17**, 49-56 (1996)

3
(a) P. Pulay, G. Fogarasi, F. Pang and J. E. Boggs, J. Am. Chem. Soc.
**101**, 2550 (1979)

(b) G. Fogarasi, X. Zhou, P. W. Taylor and P. Pulay, J. Am. Chem.
Soc. **114**, 8191 (1992)

(c) P. Pulay and G. Fogarasi, J. Chem. Phys. **96**, 4 (1992)

4
P. Pulay, in *Ab initio Methods in Quantum Chemistry. Part II*,
Adv. Chem. Phys. **69**, (1986)

5
P. Pulay, Mol. Phys. **17**, 197 (1969)

6
(a) A. Banerjee, N. Adams, J. Simons and R. Shepard, J. Phys. Chem.
**89**, 52 (1985)

(b) J. Simons and J. Nichols, Int. J. Quantum Chem., Quantum Chem.
Symp. **24**, 263 (1990)

7
(a) R. Fletcher *Practical Methods of Optimization* (Wiley,
Chichester, 1981)

(b) W. Murray and M. H. Wright *Practical Optimization*
(Academic Press, New York, 1981)

(c) M. J. D. Powell, ed. *Non-linear Optimization, 1981*
(Academic Press, New York, 1982)

(d) J. E. Dennis and R. B. Schnabel *Numerical Methods for
Unconstrained Optimization and Non-Linear Equations* (Prentice
Hall, New Jersey, 1983)

(e) L. E. Scales *Introduction to Non-linear Optimization*
(Macmillam, Basingstoke, 1985)

8
(a) P. Pulay, Chem Phys. Letters **73**, 393 (1980)

(b) P. Pulay, J. Comp. Chem. **3**, 556 (1982)

(c) P. Császár and P. Pulay, J. Mol. Struct. (THEOCHEM)
**114**, 31 (1984)

9
E. B. Wilson, J. C. Decius and P. C. Cross, *Molecular vibrations*,
McGraw-Hill, New-York, (1955)

10
(a) C. G. Broyden, J. Inst. Math. Appl. **6**, 76 (1970)

(b) R. Fletcher, Comput. J. **13**, 317 (1970)

(c) D. Goldfarb, Math. Comput. **24**, 23 (1970)

(d) D. F. Shanno, Math. Comput. **24**, 647 (1970)

11
B. Murtagh and R. W. H. Sargent, Comput J. **13**, 185 (1972)

12
(a) M. J. D. Powell, in *Nonlinear Programing*, eds. J. B.
Rosen, O. L. Mangasarian and K. Ritter (Academic, New York, 1970)

(b) M. J. D. Powell, Math. Prog. **1**, 26 (1971)

13
J. M. Bofill, J. Comp. Chem. **15**, 1 (1994)

14
J. Baker, J. Comp. Chem. **7**, 385 (1986)