2 THEORETICAL BACKGROUND

Chapter 2
THEORETICAL BACKGROUND

2.1 The Fundamental Principle of MEM in Crystallography

The general principle of MEM analysis is to find the maximum of the information entropy, \(S\), under several constraints by iterative procedures. A few variations of the MEM formalism have hitherto been adopted in the literature [1,12,13]. Here, we will follow the formalism of Collins [1] based on Jaynes’s expression of \(S\) [14].

In MEM analysis from X-ray and neutron diffraction data, electron and nuclear densities in the unit cell are represented by those in voxels (parallelepipeds) whose numbers along \(a\), \(b\), and \(c\) axes are \(N_a\), \(N_b\), and \(N_c\), respectively; densities within a voxel is regarded as even. Let \(N\) (= \(N_aN_bN_c\)) be the total number of voxels in the unit cell, \(\rho _k\) the normalized density at position \(r_k\) in the 3D gridded space, \(\tau _k\) the normalized density derived from prior information for \(r_k\), and \(\rho _k^*\) the density at \(r_k\). Then, \(S\) is computed by \begin {equation} S = - \sum _{k=1}^N \rho _k \ln \left ( \frac {\rho _k}{\tau _k} \right ) \label {eq:S} \end {equation} with \begin {equation} \rho _k = \frac {\displaystyle \rho _k^*}{\displaystyle \sum _{k=1}^N \rho _k^*}. \label {eq:const_rho} \end {equation} \(S\) is maximized under constraints \(C\) by a method of undetermined Lagrange multipliers, \begin {equation} Q = S - \lambda C - \mu \left ( \sum _{k=1}^N \rho _k - 1 \right ), \label {eq:Q} \end {equation} where \(Q\) is maximized through iterative computations with respect to \(\rho \) and two Lagrange multipliers, \(\lambda \) and \(\mu \).

At the optimum solution of MEM, partial derivatives of Eq. (2.3) with respect to any variable should be 0. Therefore, the entropy and constraint should fulfill the following condition: \begin {equation} \frac {\partial S}{\partial \rho _k} = \lambda \frac {\partial C}{\partial \rho _k} + \mu . \label {eq:dQdR} \end {equation} Alternatively, they should fulfill the equivalent set of equations in reciprocal space: \begin {equation} \frac {\partial S}{\partial F_i} = \lambda \frac {\partial C}{\partial F_i}. \label {eq:dQdF} \end {equation}

2.2 The Maximum Entropy Patterson Method

2.3 Optimization Algorithms

The following MEM algorithms are implemented in ERIS:

0th-order single pixel approximation (ZSPA) algorithm [15] .
Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) algorithm [11].

Electron or nuclear densities determined with these algorithms are similar to each other in most cases. However, solutions obtained with the ZSPA algorithm do not satisfy true maximum-entropy (MaxEnt) conditions [16], whereas the L-BFGS algorithm always gives solutions close to the true MaxEnt ones. Another algorithm, the Cambridge algorithm also gives solutions close to the true MaxEnt conditions [16]. Although a variant of the Cambridge algorithm was implemented in Dysnomia, it was removed in ERIS because the L-BFGS algorithm is much faster than the Cambridge algorithm while the two algorithm give the same solution.

2.3.1 ZSPA algorithm

In the ZSPA algorithm, \(\rho _k\) in the \((i+1)\)-th iteration is approximated by \begin {equation} \rho _k^{(i+1)} = \frac {\rho _k^{(i)}}{Z} \exp \left ( -\lambda \cdot \frac {\partial C}{\partial \rho _k^{(i)}}\right )\\ \label {eq:rhonk} \end {equation} with \begin {equation} Z = \sum _{k=1} ^{N_\mathrm {v}}\exp \left ( -\lambda \cdot \frac {\partial C}{\partial \rho _k^{(i)}}\right ). \label {eq:Z} \end {equation} Convergence criterion of the ZSPA algorithm is \(C = 0\). Since this algorithm is quite simple and easy to implement, it is widely utilized in almost all programs for MEM analysis of electron/nuclear density from diffraction data. However, note that this algorithm does not actually maximize the information entropy \(S\).

2.3.2 L-BFGS algorithm

The L-BFGS method [11] is a variant of quasi-Newton algorithms, requiring significantly less memory than other quasi-Newton methods. Instead of storing a dense \(n \times n\) approximation (\(n\): the number of variables in the problem) to the inverse Hessian matrix, L-BFGS stores only a few vectors that represent the approximation implicitly. Let \(d_k = H_k g_k\) be the search direction at \(k\)-th iteration, where \(H_k\) is the inverse Hessian matrix of the function, \(Q(\rho )\), to be maximized, \(g_k \equiv \nabla Q(\rho )\) is the gradient of \(Q\), \(s_k = \rho _{k+1} - \rho _k\), \(y_k = g_{k+1} - g_k\), and \(r_k=1/y_k^T s_k\). Then, \(d_k\) is computed as follows:

\begin {eqnarray*} \lefteqn {\hspace {-50mm}q = g_k} \\ \lefteqn {\hspace {-50mm}\mathrm {For}\; (i=k\!-\!1;\; i>=k\!-\!m;\; i\mathrm {--}) \{ } \\ \lefteqn {\hspace {-40mm}\alpha _i = r_i s_i^T q} \\ \lefteqn {\hspace {-40mm}q = q - \alpha _i y_i} \\ \lefteqn {\hspace {-50mm}\} }\\ \lefteqn {\hspace {-50mm}H_k^0 = \frac {y_{k-1}^T s_{k-1}}{y_{k-1}^T y_{k-1}}} \\ \lefteqn {\hspace {-50mm}d = H_k^0 q} \\ \\ \lefteqn {\hspace {-50mm}\mathrm {For}\; (i=k\!-\!m;\; i<=k\!-\!1;\; i\mathrm {++}) \{} \\ \lefteqn {\hspace {-40mm}\beta _i = r_i y_i^T d} \\ \lefteqn {\hspace {-40mm}d = d + s_i (\alpha _i - \beta _i)}\\ \lefteqn {\hspace {-50mm}\} } \end {eqnarray*}

The initial approximate of the inverse Hessian matrix, \(H_k^0\), is represented as a diagonal matrix. The Lagrange multipliers, \(\lambda \) and \(\mu \) in Eq. (2.3), cannot be optimized simultaneously with \(\rho \) during the L-BFGS iterations. Then, in the L-BFGS method, \(\lambda \) and \(\mu \) are gradually changed, and L-BFGS optimizations are repeated until the two constraints, i.e., Eqs. (2.2) and (3.8), are fulfilled to achieve convergence. The convergence of the L-BFGS algorithm is judged by \begin {equation} \sqrt { \sum _{k=1}^{N_\mathrm {V}} \rho _k \left ( \frac {\partial Q}{\partial \rho _k} \right )^2 } < \varepsilon \left \langle \frac {\partial S}{\partial \rho _k} \right \rangle , \label {eq:dQdr2} \end {equation} where \(\varepsilon \) is a small threshold value, which is set at \(1 \times 10^{-3}\) by default.

2.3.3 Cambridge algorithm

A variant of the Cambridge algorithm [10] was implemented in Dysnomia but removed from ERIS in favor of the L-BFGS algorithm. Both algorithms converges to the optimal MEM solution but L-BFGS algorithm is much faster. A brief summary of the Cambridge algorithm is given below just for the record.

The Cambridge algorithm adopts local quadratic approximation to Eq. (2.3). For simplicity, Dysnomia utilized only the diagonal elements of the Laplacian, \(\nabla ^2 Q\), and approximate \(Q(\rho + \Delta \rho )\) by \begin {equation} Q(\rho + \Delta \rho ) = Q(\rho ) + \Delta \rho \left ( \frac {\partial Q}{\partial \rho } \right ) + \frac {1}{2} \Delta \rho ^2 \left ( \frac {\partial ^2 Q}{\partial \rho ^2} \right ). \label {eq:quadQ} \end {equation} Then, \(Q\) can be optimized by \begin {equation} \Delta \rho _k = -\left ( \frac {\partial Q}{\partial \rho _k} \right ) \left ( \frac {\partial ^2 Q}{\partial \rho _k^2} + \alpha \cdot \frac {\partial ^2 S}{\partial \rho _k^2} \right )^{-1} , \label {eq:drho} \end {equation} where \(\alpha \) (\(\ge 0 \)) is the damping factor to ensure that \(\rho + \Delta \rho \) remains in the trust region of the quadratic approximation [10]. Convergence is judged by comparing the gradient of Eq. (2.3) with a small threshold value, \(\varepsilon \): \begin {equation} \frac {1}{N_\mathrm {v}} \sum _{k=1}^{N_\mathrm {V}} \left ( \frac {\partial Q}{\partial \rho _k} \right )^2 < \varepsilon . \label {eq:dQdr} \end {equation}

2.4 Fourier Transform Algorithms

In MEM analysis, \(F( \bm {h}_j )\)’s are calculated by Fourier transform, which is very time-consuming, from electron or nuclear densities, with a result that the CPU time for MEM analysis generally depends on both \(N\) and \(N_F\). FFT makes it possible to process huge amounts of data with very large values of \(N\) and \(N_F\) at high speed. Therefore, ERIS benefits MEM analyses of crystalline materials with large unit cells and low symmetry. ERIS is also very suitable for analyzing neutron diffraction data where nuclear densities tend to be confined in narrower spaces.

In ERIS, FFT is selected if \begin {equation} N_\mathrm {a}N_F n_\mathrm {c} > \frac {100N}{\ln N} , \end {equation} where \(N_\mathrm {a}\) is the total number of voxels in the asymmetric unit, and \(n_\mathrm {c}\) is a factor to be selected depending on the presence or absence of an inversion center at the origin (centrosymmetric: 1, noncentrosymmetric: 2).

Parallel computation

For small-scale problems, a DFT routine making full use of multi-threaded parallel processing with the OpenMP technology is automatically selected to avoid overhead caused by FFT. Thus, in both FFT and DFT, the processing ability of ERIS increases in a nearly linear fashion with increasing number of CPU cores on the use of recent multi-core CPUs.

[next] [prev] [prev-tail] [front] [up]

Chapter 2 THEORETICAL BACKGROUND