Jump to content

Oracle complexity (optimization)

From Wikipedia, the free encyclopedia

In mathematical optimization, oracle complexity is a standard theoretical framework to study the computational requirements for solving classes of optimization problems. It is suitable for analyzing iterative algorithms which proceed by computing local information about the objective function at various points (such as the function's value, gradient, Hessian etc.). The framework has been used to provide tight worst-case guarantees on the number of required iterations, for several important classes of optimization problems.

Formal description

[edit]

Consider the problem of minimizing some objective function (over some domain ), where is known to belong to some family of functions . Rather than direct access to , it is assumed that the algorithm can obtain information about via an oracle , which given a point in , returns some local information about in the neighborhood of . The algorithm begins at some initialization point , uses the information provided by the oracle to choose the next point , uses the additional information to choose the following point , and so on.

To give a concrete example, suppose that (the -dimensional Euclidean space), and consider the gradient descent algorithm, which initializes at some point and proceeds via the recursive equation

,

where is some step size parameter. This algorithm can be modeled in the framework above, where given any , the oracle returns the gradient , which is then used to choose the next point .

In this framework, for each choice of function family and oracle , one can study how many oracle calls/iterations are required, to guarantee some optimization criterion (for example, ensuring that the algorithm produces a point such that for some ). This is known as the oracle complexity of this class of optimization problems: Namely, the number of iterations such that on one hand, there is an algorithm that provably requires only this many iterations to succeed (for any function in ), and on the other hand, there is a proof that no algorithm can succeed with fewer iterations uniformly for all functions in .

The oracle complexity approach is inherently different from computational complexity theory, which relies on the Turing machine to model algorithms, and requires the algorithm's input (in this case, the function ) to be represented as a bit of strings in memory. Instead, the algorithm is not computationally constrained, but its access to the function is assumed to be constrained. This means that on the one hand, oracle complexity results only apply to specific families of algorithms which access the function in a certain manner, and not any algorithm as in computational complexity theory. On the other hand, the results apply to most if not all iterative algorithms used in practice, do not rely on any unproven assumptions, and lead to a nuanced understanding of how the function's geometry and type of information used by the algorithm affects practical performance.

Common settings

[edit]

Oracle complexity has been applied to quite a few different settings, depending on the optimization criterion, function class , and type of oracle .

In terms of optimization criterion, by far the most common one is finding a near-optimal point, namely making for some small . Some other criteria include finding an approximately-stationary point (), or finding an approximate local minima.

There are many function classes that have been studied. Some common choices include convex vs. strongly-convex vs. non-convex functions, smooth vs. non-smooth functions (say, in terms of Lipschitz properties of the gradients or higher-order derivatives), domains with bounded dimension , vs. domains with unbounded dimension, and sums of two or more functions with different properties.

In terms of the oracle , it is common to assume that given a point , it returns the value of the function at , as well as derivatives up to some order (say, value only, value and gradient, value and gradient and Hessian, etc.). Sometimes, one studies more complicated oracles. For example, a stochastic oracle returns the values and derivatives corrupted by some random noise, and is useful for studying stochastic optimization methods.[1] Another example is a proximal oracle, which given a point and a parameter , returns the point minimizing .

Examples of oracle complexity results

[edit]

The following are a few known oracle complexity results (up to numerical constants), for obtaining optimization error for some small enough , and over the domain where is not fixed and can be arbitrarily large (unless stated otherwise). We also assume that the initialization point satisfies for some parameter , where is some global minimizer of the objective function.

Function Class Oracle Oracle Complexity
Convex, -Lipschitz, fixed dimension Value + gradient [2]
Convex, -Lipschitz Value + gradient [2]
Convex, -Lipschitz gradient Value + gradient [2]
-Strongly convex, -Lipschitz gradient Value + gradient [2]
Convex, -Lipschitz Hessian Value + gradient + Hessian [3]
-Strongly convex, -Lipschitz Hessian Value + gradient + Hessian [3]

References

[edit]
  1. ^ Agarwal, Alekh; Bartlett, Peter; Ravikumar, Pradeep; Wainwright, Martin (May 2012). "Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization". IEEE Transactions on Information Theory. 58 (5): 3235–3249. arXiv:1009.0571. doi:10.1109/TIT.2011.2182178. S2CID 728066.
  2. ^ a b c d Nesterov, Yurii (2018). Lectures on Convex Optimization. Springer. ISBN 978-3-319-91578-4.
  3. ^ a b Arjevani, Yossi; Shamir, Ohad; Shiff, Ron (28 May 2018). "Oracle complexity of second-order methods for smooth convex optimization". Mathematical Programming. 178 (1–2): 327–360. arXiv:1705.07260. doi:10.1007/s10107-018-1293-1. S2CID 28260226.

Further reading

[edit]
  • Nemirovski, Arkadi; Yudin, David (1983). Problem Complexity and Method Efficiency in Optimization. John Wiley and Sons.
  • Bubeck, Sébastien (2015). "Convex Optimization: Algorithms and Complexity". Foundations and Trends in Machine Learning. 8 (3–4): 231–357. arXiv:1405.4980. doi:10.1561/2200000050.