# On Graduated Optimization for Stochastic Non-Convex Problems

@article{Hazan2016OnGO, title={On Graduated Optimization for Stochastic Non-Convex Problems}, author={Elad Hazan and Kfir Yehuda Levy and Shai Shalev-Shwartz}, journal={ArXiv}, year={2016}, volume={abs/1503.03712} }

The graduated optimization approach, also known as the continuation method, is a popular heuristic to solving non-convex problems that has received renewed interest over the last decade. Despite its popularity, very little is known in terms of theoretical convergence analysis. In this paper we describe a new first-order algorithm based on graduated optimiza- tion and analyze its performance. We characterize a parameterized family of non- convex functions for which this algorithm provably… Expand

#### Supplemental Code

Github Repo

Via Papers with Code

Graduated optimization on neural networks via adjustment of activation functions

#### 69 Citations

Stochastic Variance Reduction Gradient for a Non-convex Problem Using Graduated Optimization

- Computer Science, Mathematics
- ArXiv
- 2017

Experimental results illustrate that the new algorithms with the similar performance can converge to 'global' optimums of the nonconvex problems, and they converge faster than the GradOpt and the non Convex proximal SVRG. Expand

SVRG for a non-convex problem using graduated optimization algorithm

- Computer Science
- J. Intell. Fuzzy Syst.
- 2018

Experimental results illustrate that two new algorithms with similar performance can converge to the "global" optima of a non-convex problem comparatively faster than GradOpt or non- Convex Prox-SVRG. Expand

Convergence Analysis of Homotopy-SGD for non-convex optimization

- Computer Science, Mathematics
- ArXiv
- 2020

This work presents a first-order stochastic algorithm based on a combination of homotopy methods and SGD, called Homotopy-Stochastic Gradient Descent (H-SGD), which finds interesting connections with some proposed heuristics in the literature, e.g. optimization by Gaussian continuation, training by diffusion, mollifying networks. Expand

GS-OPT: A new fast stochastic algorithm for solving the non-convex optimization problem

- Computer Science
- 2020

A new algorithm namely GS-OPT (General Stochastic OPTimization) is proposed which is effective for solving the non-convex problems by combining two stochastic bounds of the objective function made by a commonly discrete probability distribution namely Bernoulli. Expand

On Nonconvex Decentralized Gradient Descent

- Mathematics, Computer Science
- IEEE Transactions on Signal Processing
- 2018

Somewhat surprisingly, the decentralized consensus algorithms, DGD and Prox-DGD, retain most other properties that are known in the convex setting, and can take the constraint to a nonconvex set with an easy projection. Expand

Convex Optimization with Unbounded Nonconvex Oracles using Simulated Annealing

- Computer Science, Mathematics
- COLT
- 2018

This paper studies the more general case when the noise has magnitude $\alpha F(x) + \beta$ for some $\alpha, \beta > 0$ and presents a polynomial time algorithm that finds an approximate minimizer of $F$ for this noise model. Expand

Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

- Mathematics, Computer Science
- COLT
- 2017

The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks. Expand

Convex Relaxation Regression: Black-Box Optimization of Smooth Functions by Learning Their Convex Envelopes

- Computer Science, Mathematics
- UAI
- 2016

Convex Relaxation Regression (CoRR) is introduced, an approach for learning convex relaxations for a class of smooth functions and it is proved that with probability greater than $1-\delta$, the solution of the algorithm converges to the global optimizer of $f$ with error. Expand

Homotopy Analysis for Tensor PCA

- Mathematics, Computer Science
- COLT
- 2017

The class of homotopy or continuation methods for global optimization of nonconvex functions, which start from an objective function that is efficient to optimize, and progressively modify it to obtain the required objective, and the solutions are passed along the Homotopy path. Expand

DynaNewton - Accelerating Newton's Method for Machine Learning

- Computer Science
- ArXiv
- 2016

A novel continuation method, where a family of objectives is defined over increasing sample sizes and with decreasing regularization strength, to accelerate Newton's method, namely, subsampling training data and increasing strong convexity through regularization. Expand

#### References

SHOWING 1-10 OF 26 REFERENCES

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

- Mathematics, Computer Science
- ICML
- 2012

This paper investigates the optimality of SGD in a stochastic setting, and shows that for smooth problems, the algorithm attains the optimal O(1/T) rate, however, for non-smooth problems the convergence rate with averaging might really be Ω(log(T)/T), and this is not just an artifact of the analysis. Expand

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

- Computer Science, Mathematics
- NIPS
- 2014

This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance. Expand

A Theoretical Analysis of Optimization by Gaussian Continuation

- Computer Science
- AAAI
- 2015

A theoretical analysis is provided that provides a bound on the end-point solution of the continuation method and it is shown that this characteristic can be analytically computed when the objective function is expressed in some suitable basis functions. Expand

On the Link between Gaussian Homotopy Continuation and Convex Envelopes

- Mathematics, Computer Science
- EMMCVPR
- 2014

It is proved that Gaussian smoothing emerges from the best affine approximation to Vese’s nonlinear PDE, hence providing the optimal convexification. Expand

A Path Following Algorithm for the Graph Matching Problem

- Computer Science, Medicine
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- 2009

This method allows to easily integrate the information on graph label similarities into the optimization problem, and therefore, perform labeled weighted graph matching and is compared with some of the best performing graph matching methods on four data sets. Expand

A GNC Algorithm for Deblurring Images with Interacting Discontinuities

- 2007

In this paper we present a Graduated Non-Convexity (GNC) algorithm for reconstructing images. We assume that the data images are blurred and corrupted by white Gaussian noise. Geometric features of… Expand

Online convex optimization in the bandit setting: gradient descent without a gradient

- Mathematics, Computer Science
- SODA '05
- 2005

It is possible to use gradient descent without seeing anything more than the value of the functions at a single point, and the guarantees hold even in the most general case: online against an adaptive adversary. Expand

Smoothing a Program Soundly and Robustly

- Computer Science
- CAV
- 2011

The foundations of smooth interpretation are studied, a recently-proposed program approximation scheme that facilitates the use of local numerical search techniques in program analysis and synthesis and the design of an approximate smooth interpreter is described that provably satisfies soundness, robustness, and β-robustness. Expand

Numerical continuation methods - an introduction

- Mathematics, Computer Science
- Springer series in computational mathematics
- 1990

This book discusses Continuation Methods, Newton's Method and Orthogonal Decompositions Revisited, and Update Methods and their Numerical Stability. Expand

The Effective Energy Transformation Scheme as a Special Continuation Approach to Global Optimization with Application to Molecular Conformation

- Mathematics, Computer Science
- SIAM J. Optim.
- 1996

This paper discusses a generalization of the function transformation scheme used in Coleman, Shalloway, and Wu for global energy minimization applied to the molecular conformation problem and shows that the method can transform a nonlinear objective function into a class of gradually deformed, but ``smoother'' or ``easier'' functions. Expand