Manual/BFGS_optimizer.html at main · openmopac/Manual · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
<html>

<head>

<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-60EDW3Y6ET"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', 'G-60EDW3Y6ET');
</script>


<title> The BFGS function optimizer</title>
</head>
<body bgcolor="white">


<h2><a name="SECTION001272000000000000000">
 The BFGS function optimizer</a>
</h2>
The alternative heat of formation minimization routine in  MOPAC is a modified
Broyden&nbsp;[<a href="references.html#bfgs1">19</a>]-Fletcher&nbsp;[<a href="references.html#bfgs2">20</a>]-Goldfarb&nbsp;[<a href="references.html#bfgs3">21</a>]-Shanno&nbsp;[<a href="references.html#bfgs4">22</a>]
or BFGS method. Minor changes were made necessary by the presence of phenomena
peculiar to chemical systems.
<p>
Starting with a user-supplied geometry <i>x</i><sub><i>o</i></sub>, MOPAC computes an estimate to the
inverse Hessian <i>H</i><sub><i>o</i></sub>. The geometry optimization proceeds by
<br/></p><p></p>
<div align="CENTER">
<img alt="\begin{displaymath}x_{k+1} = x_k+\alpha d_k,
\end{displaymath}" height="30" src="img1286.gif" width="104"/>
</div>
<br clear="ALL"/>
<p></p>
where
<br/><p></p>
<div align="CENTER">
<img alt="\begin{displaymath}d_k=H\,g_k ,
\end{displaymath}" height="30" src="img1287.gif" width="63"/>
</div>
<br clear="ALL"/>
<p></p>
and each element of <i>H</i> is defined by
<br/><p></p>
<div align="CENTER">
<img alt="\begin{displaymath}H_{k+1}=H_k-\frac{H\ y_k\ p_k^t + p_ky_k^tH}{S}+\frac{Q(p_k\ p_k^t)}{S},
\end{displaymath}" height="41" src="img1288.gif" width="270"/>
</div>
<br clear="ALL"/>
<p></p>
where
<br/><p></p>
<div align="CENTER">
<img alt="\begin{displaymath}Q=1+\frac{y_k^t\ H\ y_k}{p_k^t\ y_k},
\end{displaymath}" height="47" src="img1289.gif" width="105"/>
</div>
<br clear="ALL"/>
<p></p>
and <i>g</i><sub><i>k</i></sub> is the gradient vector on step <i>k</i>.
<p>
Although this expression for the update of
the Hessian matrix looks very complicated, the operation can be summarized as
follows:
</p><p>
The initial Hessian matrix used in geometry optimization is chosen as a
diagonal matrix, with the diagonal elements determined by a simple formula
based on the gradients at two geometries.  As the optimization proceeds, the
gradients at each point are used to improve the Hessian.  In particular, the
off-diagonal elements are assigned based on the old elements and the current
gradients.
</p><p>
Two different methods are used to calculate the displacement of <i>x</i> in the
direction <i>d</i>. During the initial stages of geometry optimization, a line
search is used. This proceeds as follows:
</p><p>
The geometry is displaced by
(<span class="b">&alpha;</span>/4)d and the energy evaluated via an SCF calculation. If this energy is lower than
the original value, then a second step of the same size is made. If it is
higher, then a step of -
(<span class="b">&alpha;</span>/4)d&nbsp;
is made. The energy is then re-evaluated.
Given the three energies, a prediction is made as to the value of
&alpha; which will yield the  minimum  value  of the energy in the direction <i>d</i>. Of
course, the size of the steps are constrained so that the system would not
suddenly become unrealistic (e.g., break bonds, superimpose atoms, etc.).
Similarly, the contingency in which the energy versus &alpha; function is
inverse parabolic is considered, as are rarely-encountered curves, e.g., almost
perfectly linear regressions. By default, Thiel's FSTMIN <a name="BM17999">&nbsp;</a> technique is used&nbsp;[<a href="references.html#fstmin">44</a>].  This uses gradient
information from the starting point of the search, and the calculated
<span class="b">&Delta;</span>H<sub>f</sub>, to decide when to end the line search.  If <tt><a href="nothiel.html">NOTHIEL</a></tt> is specified,
the older line-search is used, in which case the search is stopped when the
drop in energy on any step becomes less than 5% of the total drop or 0.5
kcal/mol, whichever is smaller.
</p><p>
An  important modification has  been made to the BFGS routine.  For the
line-search, Thiel's FSTMIN technique is used. This  modification make the
algorithm run faster most of the time.  However, one unfortunate result of
these changes is that there is no guarantee that as  the cycles increase, the
energy will drop monotonically.  If the calculation does not converge on a
stationary point, then re-run the job with <tt>NOTHIEL</tt>.
</p><p>
As the geometry converges on a local minimum, the prediction of the search
direction becomes less accurate. There are many reasons for this. For example,
the finite precision of the SCF calculation may lead to errors in the density
matrix, or finite step sizes in the derivative calculation (if analytical
derivatives are not used) may result in errors in the derivatives. For whatever
reason, the gradient norm and energy minimum may not coincide. The difference
is typically less than 0.00001 kcal/mol and less than 0.05 units of gradient
norm.
</p><p>
Normally, the initial guess to <i>H</i>, the inverse Hessian, is the unit matrix.
However, in chemical systems where the second derivatives are very large, use
of the unit matrix would result in large changes in the geometry. Thus a
slightly elongated bond length could, in the first step, change from 1.6&Aring;
to -6.5&Aring;. To prevent this catastrophe, the initial geometry is perturbed by
a small amount, thus
</p><p align="center">
x<sub>1</sub> = x<sub>0</sub> + 0.01*sign(g<sub>0</sub>)</p><p>from which a trial inverse Hessian can be constructed:
</p>
<p align="center">H<sub>1</sub>(i,i) = 0.01*sign(g<sub>0</sub>(i))/y<sub>1</sub>(i).<br clear="ALL"/>
</p>
<p></p>
A negative value for <i>H</i><sub>1</sub>(<i>i</i>,<i>i</i>) would lead to difficulties
within the BFGS optimization. To avoid this, <i>H</i><sub>1</sub>(<i>i</i>,<i>i</i>) is set
to 0.06/abs(g(i))&nbsp;
whenever the sign of
(<i>g</i><sub>0</sub>(<i>i</i>))/<i>y</i><sub>1</sub>(<i>i</i>) is negative.
<p>
As the optimization proceeds, the inverse Hessian matrix becomes more accurate.
However, as the geometry steadily changes, the inverse Hessian will contain
information which does not reflect the current point. This can lead to the
predicted search direction vector making an angle of more than
90<sup>o</sup> with the gradient vector. In other words, the search direction vector may point
uphill in energy. To guard against this, the inverse Hessian is re-initialized
whenever the cosine of the angle between the search direction and the gradient
vector drops below 0.05.
</p><p>
Originally the Davidson-Fletcher-Powell technique was used, but in rare
instances it failed to work satisfactorily. The BFGS formula appears to work as
well as or better than the DFP method most of the time. In the infrequent case
when the DFP is more efficient, the increase in efficiency of the DFP can
usually be traced to a fortuitous choice of a search direction. Small changes
in starting conditions can destroy this accidental increased efficiency and
make the BFGS method appear more efficient. A keyword, <tt>
<a href="dfp.html">DFP</a></tt>, is provided
to allow the DFP optimizer to be used.
</p></body>
</html>