Draft:Compact representation (optimization)

Review waiting, please be patient.

This may take 3 months or more, since drafts are reviewed in no specific order. There are 1,513 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL
Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · Compact representation (optimization) (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Bing, Wikipedia) · Submitted 21 days ago by Johannesbrust (talk: D · +) · Last edited 11 days ago by Johannesbrust

The compact representation for quasi-Newton methods is a matrix decomposition, which is typically used in gradient based optimization algorithms or for solving nonlinear systems. The decomposition uses a low-rank representation for the direct and/or inverse Hessian or the Jacobian of a nonlinear system. Because of this, the compact representation is particularly suited for large problems and constrained optimization.

The compact matrix decomposition of a dense Hessian approximation — The compact representation (right) of a dense Hessian approximation (left) is a initial matrix (typically diagonal) plus low rank decomposition. It has a small memory footprint (shaded areas) and enables efficient matrix computations

Definition

The compact representation of a quasi-Newton matrix for the inverse Hessian $H_{k}$ or direct Hessian $B_{k}$ of a nonlinear objective function $f(x):\mathbb {R} ^{n}\to \mathbb {R}$ is based on expressing a sequence of recursive rank-1 or rank-2 matrix updates as one rank- $k$ or rank- $2k$ update of an initial matrix ^[1] . Because it is derived from quasi-Newton updates, it uses differences of iterates and gradients $\nabla f(x_{k})=g_{k}$ in its definition $\{s_{i-1}=x_{i}-x_{i-1},y_{i-1}=g_{i}-g_{i-1}\}_{i=1}^{k}$ . In particular, for $r=k$ or $r=2k$ the rectangular $n\times r$ matrices $U_{k},J_{k}$ and the $r\times r$ square symmetric systems $M_{k},N_{k}$ depend on the $s_{i},y_{i}$ 's and define the quasi-Newton representations

H_{k}=H_{0}+U_{k}M_{k}^{-1}U_{k}^{T},\quad {\text{ and }}\quad B_{k}=B_{0}+J_{k}N_{k}^{-1}J_{k}^{T}

Applications

Because of the special matrix decomposition the compact representation is implemented in state-of-the-art optimization software ^[2]^[3]^[4]. When combined with limited-memory techniques it is a popular technique for constrained optimization with gradients ^[5]. Linear algebra operations can be done efficiently, like matrix-vector products, solves or eigendecompositions. It can be combined with line-search and trust region techniques, and the representation has been developed for many quasi-Newton updates. For instance, the matrix vector product with the direct quasi-Newton Hessian and an arbitrary vector $g\in \mathbb {R} ^{n}$ is:

{\begin{aligned}p_{k}^{(0)}&=J_{k}^{T}g\\{\text{solve}}\quad N_{k}p_{k}^{(1)}&=p_{k}^{(0)}\quad \quad {\text{(}}N_{k}{\text{ is small)}}\\p_{k}^{(2)}&=J_{k}p_{k}^{(1)}\\p_{k}^{(3)}&=H_{0}g\\p_{k}^{\phantom {(4)}}&=p_{k}^{(2)}+p_{k}^{(3)}\end{aligned}}

Background

Walker^[6] showed that a product of Householder transformations (an identity plus rank-1) can be expressed as a compact matrix formula. This result led the authors in ^[5] to derive an explicit matrix expression for the product of $k$ identity plus rank-1 matrices. Specifically, for ${\textstyle S_{k}={\begin{bmatrix}s_{0}&s_{1}&\ldots s_{k-1}\end{bmatrix}},}$ $~Y_{k}={\begin{bmatrix}y_{0}&y_{1}&\ldots y_{k-1}\end{bmatrix}},$ $~(R_{k})_{ij}=s_{i-1}^{T}y_{j-1},$ $~\rho _{i-1}=1/s_{i-1}^{T}y_{i-1}$ and ${\textstyle ~V_{i}=I-\rho _{i-1}y_{i-1}s_{i-1}^{T}}$ when $1\leq i\leq j\leq k$ the product of $k$ rank-1 updates to the identity is $\prod _{i=1}^{k}V_{i-1}=\left(I-\rho _{0}y_{0}s_{0}^{T}\right)\cdots \left(I-\rho _{k-1}y_{k-1}s_{k-1}^{T}\right)=I-Y_{k}R_{k}^{-1}S_{k}^{T}$ The BFGS update can be expressed in terms of products of the $V_{i}$ 's, which have a compact matrix formula. Therefore, the BFGS recursion can exploit these block matrix representations

{\begin{aligned}H_{k}&=V_{k-1}H_{k-1}V_{k-1}^{T}+\rho _{k-1}s_{k-1}s_{k-1}^{T}\\&=\left(V_{k-1}\cdots V_{1}V_{0})H_{0}(V_{0}^{T}V_{1}^{T}\cdots V_{k-1}^{T}\right)+\\&{\phantom {=}}\rho _{0}\left(V_{k-1}\cdots V_{1}\right)s_{0}s_{0}^{T}\left(V_{1}^{T}\cdots V_{k-1}^{T}\right)+\\&{\phantom {=}}\quad \vdots \\&{\phantom {=}}\rho _{k-2}V_{k-1}s_{k-2}s_{k-2}^{T}V_{k-1}^{T}+\\&{\phantom {=}}\rho _{k-1}s_{k-1}s_{k-1}^{T}\end{aligned}}

(1)

Recursive quasi-Newton updates

A parametric family of quasi-Newton updates includes many of the most known formulas ^[7]. For arbitrary vectors $v_{k}$ and $c_{k}$ such that $v_{k}^{T}y_{k}\neq 0$ and $c_{k}^{T}s_{k}\neq 0$ general recursive update formulas for the inverse and direct Hessian estimates are

H_{k+1}=H_{k}+{\frac {(s_{k}-H_{k}y_{k})v_{k}^{T}+v_{k}(s_{k}-H_{k}y_{k})^{T}}{v_{k}^{T}y_{k}}}-{\frac {(s_{k}-H_{k}y_{k})^{T}y_{k}}{(v_{k}^{T}y_{k})^{2}}}v_{k}v_{k}^{T}

(2)

B_{k+1}=B_{k}+{\frac {(y_{k}-B_{k}s_{k})c_{k}^{T}+c_{k}(y_{k}-B_{k}s_{k})^{T}}{c_{k}^{T}s_{k}}}-{\frac {(y_{k}-B_{k}s_{k})^{T}s_{k}}{(c_{k}^{T}s_{k})^{2}}}c_{k}c_{k}^{T}

(3)

By making specific choices for the parameter vectors $v_{k}$ and $c_{k}$ well known methods are recovered

Table 1: Quasi-Newton updates parametrized by vectors $v_{k}$ and $c_{k}$
$v_{k}$	${\text{method}}$	$c_{k}$	${\text{method}}$
$s_{k}$	BFGS	$s_{k}$	PSB (Powell Symmetric Broyden)
$y_{k}$	${\text{Greenstadt's}}$	$y_{k}$	DFP
$s_{k}-H_{k}y_{k}$	SR1	$y_{k}-B_{k}s_{k}$	SR1
		$P_{k}^{\text{S}}s_{k}$ ^[8]	MSS (Multipoint-Symmetric-Secant)

Compact Representations

Since many quasi-Newton methods follow from the general updates in (2) and (3) the compact representation of these updates consequently give the representation of most QN formulas. Define

$S_{k}={\begin{bmatrix}s_{0}&s_{1}&\ldots &s_{k-1}\end{bmatrix}},$ $Y_{k}={\begin{bmatrix}y_{0}&y_{1}&\ldots &y_{k-1}\end{bmatrix}},$ $V_{k}={\begin{bmatrix}v_{0}&v_{1}&\ldots &v_{k-1}\end{bmatrix}},$ $C_{k}={\begin{bmatrix}c_{0}&c_{1}&\ldots &c_{k-1}\end{bmatrix}},$

upper triangular

${\big (}R_{k}{\big )}_{ij}:={\big (}R_{k}^{\text{SY}}{\big )}_{ij}=s_{i-1}^{T}y_{j-1},\quad {\big (}R_{k}^{\text{VY}}{\big )}_{ij}=v_{i-1}^{T}y_{j-1},\quad {\big (}R_{k}^{\text{CS}}{\big )}_{ij}=c_{i-1}^{T}s_{j-1},\quad \quad {\text{ for }}1\leq i\leq j\leq k$

lower triangular

${\big (}L_{k}{\big )}_{ij}:={\big (}L_{k}^{\text{SY}}{\big )}_{ij}=s_{i-1}^{T}y_{j-1},\quad {\big (}L_{k}^{\text{VY}}{\big )}_{ij}=v_{i-1}^{T}y_{j-1},\quad {\big (}L_{k}^{\text{CS}}{\big )}_{ij}=c_{i-1}^{T}s_{j-1},\quad \quad {\text{ for }}1\leq j<i\leq k$

and diagonal

$(D_{k})_{ij}:={\big (}D_{k}^{\text{SY}}{\big )}_{ij}=s_{i-1}^{T}y_{j-1},\quad \quad {\text{ for }}1\leq i=j\leq k$

With these definitions the compact representations of the general rank-2 updates in (2) and (3) is ^[9]

$H_{k}=H_{0}+U_{k}M_{k}^{-1}U_{k}^{T},$

(4)

$U_{k}={\begin{bmatrix}V_{k}&S_{k}-H_{0}Y_{k}\end{bmatrix}}$

$M_{k}={\begin{bmatrix}0_{k\times k}&R_{k}^{\text{VY}}\\{\big (}R_{k}^{\text{VY}}{\big )}^{T}&R_{k}+R_{k}^{T}-(D_{k}+Y_{k}^{T}H_{0}Y_{k})\end{bmatrix}}$

and the formula for the direct Hessian is

$B_{k}=B_{0}+J_{k}N_{k}^{-1}J_{k}^{T},$

(5)

$J_{k}={\begin{bmatrix}C_{k}&Y_{k}-B_{0}S_{k}\end{bmatrix}}$

$N_{k}={\begin{bmatrix}0_{k\times k}&R_{k}^{\text{CS}}\\{\big (}R_{k}^{\text{CS}}{\big )}^{T}&R_{k}+R_{k}^{T}-(D_{k}+S_{k}^{T}B_{0}S_{k})\end{bmatrix}}$

For instance, when $V_{k}=S_{k}$ the representation in (4) is the compact formula for the BFGS recursion in (1). The validity of these expressions can be simply checked by numerically comparing the difference between, e.g., (2) and (4) (or, (3) and (5)) for the same $k$ .

Specific Representations

Without the parametrized update, (eqs. (2), (3)), which represents many well known formulas, the compact representation is specific to each quasi-Newton recursion. However, all formulas are expressed as a initial matrix plus a low rank update, and are equivalent to (eqs. (2) or (3)) whenever there is an equivalence in the recursive formulas of the updates.

BFGS

An equivalent compact representation for the BFGS (Broyden-Fletcher-Goldfarb-Shanno) exists ^[5]. Along with the SR1 these were the first compact formulas known. In particular, the inverse representation is given by

$H_{k}=H_{0}+U_{k}M_{k}^{-1}U_{k}^{T},\quad U_{k}={\begin{bmatrix}S_{k}&H_{0}Y_{k}\end{bmatrix}},\quad M_{k}^{-1}=\left[{\begin{smallmatrix}R_{k}^{-T}(D_{k}+Y_{k}^{T}H_{0}Y_{k})R_{k}^{-1}&-R_{k}^{-T}\\-R_{k}^{-1}&0\end{smallmatrix}}\right]$

After simplification, this formula is exactly (4) with $V_{k}=S_{k}$ .

The direct Hessian approximation can be found by applying the Sherman-Morrison-Woodbury identity to the inverse Hessian:

$B_{k}=B_{0}+J_{k}N_{k}^{-1}J_{k}^{T},\quad J_{k}={\begin{bmatrix}B_{0}S_{k}&Y_{k}\end{bmatrix}},\quad N_{k}=\left[{\begin{smallmatrix}S^{T}B_{0}S_{k}&L_{k}\\L_{k}^{T}&-D_{k}\end{smallmatrix}}\right]$

SR1

The SR1 (Symmetric Rank-1) compact representation was first proposed in ^[5]. Using the definitions of $D_{k},L_{k}$ and $R_{k}$ from above, the inverse Hessian formula is given by

$H_{k}=H_{0}+U_{k}M_{k}^{-1}U_{k}^{T},\quad U_{k}=S_{k}-H_{0}Y_{k},\quad M_{k}=R_{k}+R_{k}^{T}-D_{k}-Y_{k}^{T}H_{0}Y_{k}$

The direct Hessian is obtained by the Sherman-Morrison-Woodbury identity and has the form

$B_{k}=B_{0}+J_{k}N_{k}^{-1}J_{k}^{T},\quad J_{k}=Y_{k}-B_{0}S_{k},\quad N_{k}=D_{k}+L_{k}+L_{k}^{T}-S_{k}^{T}B_{0}S_{k}$

MSS

The multipoint symmetric secant (MSS) method is a method that aims to satisfy multiple secant equations. The recursive update formula was originally developed by Burdakov ^[10]. The compact representation for the direct Hessian was derived in ^[9]

$B_{k}=B_{0}+J_{k}N_{k}^{-1}J_{k}^{T},\quad J_{k}={\begin{bmatrix}S_{k}&Y_{k}-B_{0}S_{k}\end{bmatrix}},\quad N_{k}=\left[{\begin{smallmatrix}W_{k}(S_{k}^{T}B_{0}S_{k}-(R_{k}-D_{k}+R_{k}^{T}))W_{k}&W_{k}\\W_{k}&0\end{smallmatrix}}\right]^{-1},\quad W_{k}=(S_{k}^{T}S_{k})^{-1}$

The inverse representation can be obtained by application for the Sherman-Morrison-Woodbury identity.

DFP

Since the DFP (Davidon Fletcher Powell) update is the dual of the BFGS formula (i.e., swapping $H_{k}\leftrightarrow B_{k}$ , $H_{0}\leftrightarrow B_{0}$ and $y_{k}\leftrightarrow s_{k}$ in the BFGS update), the compact representation for DFP can be immediately obtained from the one for BFGS ^[11].

PSB

The PSB (Powell-Symmetric-Broyden) compact representation was developed for the direct Hessian approximation^[12]. It is equivalent to substituting $C_{k}=S_{k}$ in (5)

$B_{k}=B_{0}+J_{k}N_{k}^{-1}J_{k}^{T},\quad J_{k}={\begin{bmatrix}S_{k}&Y_{k}-B_{0}S_{k}\end{bmatrix}},\quad N_{k}=\left[{\begin{smallmatrix}0&R_{k}^{\text{SS}}\\(R_{k}^{\text{SS}})^{T}&R_{k}+R_{k}^{T}-(D_{k}+S_{k}^{T}B_{0}S_{k})\end{smallmatrix}}\right]$

Reduced BFGS

The reduced compact representation (RCR) of BFGS is for linear equality constrained optimization ${\text{ minimize }}f(x){\text{ subject to: }}Ax=b$ , where $A$ is underdetermined. In addition to the matrices $S_{k},Y_{k}$ the RCR also stores the projections of the $y_{i}$ 's onto the nullspace of $A$

$Z_{k}={\begin{bmatrix}z_{0}&z_{1}&\cdots z_{k-1}\end{bmatrix}},\quad z_{i}=Py_{i},\quad P=I-A(A^{T}A)^{-1}A^{T},\quad 0\leq i\leq k-1$

For $B_{k}$ the compact representation of the BFGS matrix (with a multiple of the identity $B_{0}$ ) the (1,1) block of the inverse KKT matrix has the compact representation^[13]

$K_{k}={\begin{bmatrix}B_{k}&A^{T}\\A&0\end{bmatrix}},\quad B_{0}=\gamma _{k}I,\quad H_{0}={\frac {1}{\gamma _{k}}}I,\quad \gamma _{k}>0$

${\big (}K_{k}^{-1}{\big )}_{11}=H_{0}+U_{k}M_{k}^{-1}U_{k}^{T},\quad U_{k}={\begin{bmatrix}A^{T}&S_{k}&Z_{k}\end{bmatrix}},\quad M_{k}=\left[{\begin{smallmatrix}-\gamma _{k}AA^{T}&\\&G_{k}\end{smallmatrix}}\right],\quad G_{k}=\left[{\begin{smallmatrix}R_{k}^{-T}(D_{k}+Y_{k}^{T}H_{0}Y_{k})R_{k}^{-1}&-H_{0}R_{k}^{-T}\\-H_{0}R_{k}^{-1}&0\end{smallmatrix}}\right]^{-1}$

Limited Memory

The most common use of the compact representations is for the limited-memory setting where $m\ll n$ denotes the memory parameter, with typical values around $m\in [5,12]$ (see e.g., ^[13]^[5]). Then, instead of storing the history of all vectors one limits this to the $m$ most recent vectors $\{(s_{i},y_{i}\}_{i=k-m}^{k-1}$ and possibly $\{v_{i}\}_{i=k-m}^{k-1}$ or $\{c_{i}\}_{i=k-m}^{k-1}$ . Further, typically the initialization is chosen as an adaptive multiple of the identity $H_{k}^{(0)}=\gamma _{k}I$ , with $\gamma _{k}=y_{k-1}^{T}s_{k-1}/y_{k-1}^{T}y_{k-1}$ and $B_{k}^{(0)}={\frac {1}{\gamma _{k}}}I$ . Limited-memory methods are frequently used for large-scale problems with many variables (i.e., $n$ can be large), in which the limited-memory matrices $S_{k}\in \mathbb {R} ^{n\times m}$ and $Y_{k}\in \mathbb {R} ^{n\times m}$ (and possibly $V_{k},C_{k}$ ) are tall and very skinny: $S_{k}={\begin{bmatrix}s_{k-l-1}&\ldots &s_{k-1}\end{bmatrix}}$ and $Y_{k}={\begin{bmatrix}y_{k-l-1}&\ldots &y_{k-1}\end{bmatrix}}$ .

Implementations

Open source implementations include:

ACM TOMS algorithm 1030 implements a L-SR1 solver ^[14] ^[15]
R's optim general-purpose optimizer routine uses the L-BFGS-B method.
SciPy's optimization module's minimize method also includes an option to use L-BFGS-B.
IPOPT with first order information

Non open source implementations include:

Artelys Knitro nonlinear programming (NLP) solvers use compact quasi-Newton matrices ^[2]
L-BFGS-B (ACM TOMS algorithm 778)^[16]

Works cited

^ Nocedal, J.; Wright, S.J. (2006). Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer New York, NY. doi:10.1007/978-0-387-40065-5. ISBN 978-0-387-30303-1.
^ ^a ^b Byrd, R. H.; Nocedal, J; Waltz, R. A. (2006). "KNITRO: An integrated package for nonlinear optimization". Large-Scale Nonlinear Optimization. Nonconvex Optimization and Its Applications. Vol. 83. In: Di Pillo, G., Roma, M. (eds) Large-Scale Nonlinear Optimization. Nonconvex Optimization and Its Applications, vol 83.: Springer, Boston, MA. p. 35-59. doi:10.1007/0-387-30065-1_4. ISBN 978-0-387-30063-4.{{cite book}}: CS1 maint: location (link)
^ Zhu, C.; Byrd, R. H.; Lu, P.; Nocedal, J. (1997). "Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization". ACM Transactions on Mathematical Software (TOMS). 23 (4): 550-560. doi:10.1145/279232.279236.
^ Wächter, A.; Biegler, L. T. (2006). "On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming". Mathematical Programming. 106: 25-57. doi:10.1007/s10107-004-0559-y.
^ ^a ^b ^c ^d ^e Byrd, R. H.; Nocedal, J.; Schnabel, R. B. (1994). "Representations of Quasi-Newton Matrices and their use in Limited Memory Methods". Mathematical Programming. 63 (4): 129–156. doi:10.1007/BF01582063. S2CID 5581219.
^ Walker, H. F. (1988). "Implementation of the GMRES Method Using Householder Transformations". SIAM Journal on Scientific and Statistical Computing. 9 (1): 152–163. doi:10.1137/0909010.
^ Dennis, Jr, J. E.; Moré, J. J. (1977). "Quasi-Newton methods, motivation and theory". SIAM Review. 19 (1): 46-89. doi:10.1137/1019005. hdl:1813/6056.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ $S_{k+1}={\begin{bmatrix}s_{0}&\ldots &s_{k}\end{bmatrix}},~P_{k}^{\text{S}}=I-S_{k+1}(S_{k+1}^{T}S_{k+1})^{-1}S_{k+1}^{T}$
^ ^a ^b Burdakov, O. P.; Martínez, J. M.; Pilotta, E. A. (2002). "A limited-memory multipoint symmetric secant method for bound constrained optimization". Annals of Operations Research. 117: 51–70. doi:10.1023/A:1021561204463.
^ Burdakov, O. P. (1983). "Methods of the secant type for systems of equations with symmetric Jacobian matrix". Numerical Functional Analysis and Optimization. 6 (2): 1–18. doi:10.1080/01630568308816160.
^ Erway, J. B.; Jain, V.; Marcia, R. F. (2013). Shifted limited-memory DFP systems. In 2013 Asilomar Conference on Signals, Systems and Computers. IEEE. pp. 1033–1037.
^ Kanzow, C.; Steck, D. (2023). "Regularization of limited memory quasi-Newton methods for large-scale nonconvex minimization". Mathematical Programming Computation. 15 (3): 417–444. doi:10.1007/s12532-023-00238-4.
^ ^a ^b Brust, J. J; Marcia, R.F.; Petra, C.G.; Saunders, M. A. (2022). "Large-scale optimization with linear equality constraints using reduced compact representation". SIAM Journal on Scientific Computing. 44 (1): A103–A127. arXiv:2101.11048. Bibcode:2022SJSC...44A.103B. doi:10.1137/21M1393819.
^ "Collected Algorithms of the ACM (CALGO)". calgo.acm.org.
^ "TOMS Alg. 1030". calgo.acm.org/1030.zip.
^ Zhu, C.; Byrd, Richard H.; Lu, Peihuang; Nocedal, Jorge (1997). "L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization". ACM Transactions on Mathematical Software. 23 (4): 550–560. doi:10.1145/279232.279236. S2CID 207228122.

[nw-1] Nocedal, J.; Wright, S.J. (2006). Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer New York, NY. doi:10.1007/978-0-387-40065-5. ISBN 978-0-387-30303-1.

[knitro-2] Byrd, R. H.; Nocedal, J; Waltz, R. A. (2006). "KNITRO: An integrated package for nonlinear optimization". Large-Scale Nonlinear Optimization. Nonconvex Optimization and Its Applications. Vol. 83. In: Di Pillo, G., Roma, M. (eds) Large-Scale Nonlinear Optimization. Nonconvex Optimization and Its Applications, vol 83.: Springer, Boston, MA. p. 35-59. doi:10.1007/0-387-30065-1_4. ISBN 978-0-387-30063-4.{{cite book}}: CS1 maint: location (link)

[lbfgsb-3] Zhu, C.; Byrd, R. H.; Lu, P.; Nocedal, J. (1997). "Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization". ACM Transactions on Mathematical Software (TOMS). 23 (4): 550-560. doi:10.1145/279232.279236.

[ipopt-4] Wächter, A.; Biegler, L. T. (2006). "On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming". Mathematical Programming. 106: 25-57. doi:10.1007/s10107-004-0559-y.

[compact-5] Byrd, R. H.; Nocedal, J.; Schnabel, R. B. (1994). "Representations of Quasi-Newton Matrices and their use in Limited Memory Methods". Mathematical Programming. 63 (4): 129–156. doi:10.1007/BF01582063. S2CID 5581219.

[6] Walker, H. F. (1988). "Implementation of the GMRES Method Using Householder Transformations". SIAM Journal on Scientific and Statistical Computing. 9 (1): 152–163. doi:10.1137/0909010.

[7] Dennis, Jr, J. E.; Moré, J. J. (1977). "Quasi-Newton methods, motivation and theory". SIAM Review. 19 (1): 46-89. doi:10.1137/1019005. hdl:1813/6056.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[8] $S_{k+1}={\begin{bmatrix}s_{0}&\ldots &s_{k}\end{bmatrix}},~P_{k}^{\text{S}}=I-S_{k+1}(S_{k+1}^{T}S_{k+1})^{-1}S_{k+1}^{T}$

[msscompact-9] Burdakov, O. P.; Martínez, J. M.; Pilotta, E. A. (2002). "A limited-memory multipoint symmetric secant method for bound constrained optimization". Annals of Operations Research. 117: 51–70. doi:10.1023/A:1021561204463.

[mssoriginal-10] Burdakov, O. P. (1983). "Methods of the secant type for systems of equations with symmetric Jacobian matrix". Numerical Functional Analysis and Optimization. 6 (2): 1–18. doi:10.1080/01630568308816160.

[11] Erway, J. B.; Jain, V.; Marcia, R. F. (2013). Shifted limited-memory DFP systems. In 2013 Asilomar Conference on Signals, Systems and Computers. IEEE. pp. 1033–1037.

[12] Kanzow, C.; Steck, D. (2023). "Regularization of limited memory quasi-Newton methods for large-scale nonconvex minimization". Mathematical Programming Computation. 15 (3): 417–444. doi:10.1007/s12532-023-00238-4.

[rcr-13] Brust, J. J; Marcia, R.F.; Petra, C.G.; Saunders, M. A. (2022). "Large-scale optimization with linear equality constraints using reduced compact representation". SIAM Journal on Scientific Computing. 44 (1): A103–A127. arXiv:2101.11048. Bibcode:2022SJSC...44A.103B. doi:10.1137/21M1393819.

[14] "Collected Algorithms of the ACM (CALGO)". calgo.acm.org.

[15] "TOMS Alg. 1030". calgo.acm.org/1030.zip.

[algo778-16] Zhu, C.; Byrd, Richard H.; Lu, Peihuang; Nocedal, Jorge (1997). "L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization". ACM Transactions on Mathematical Software. 23 (4): 550–560. doi:10.1145/279232.279236. S2CID 207228122.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]