Quantitative convergence estimates of deterministic and stochastic methods for optimization and minimax problems

Axel Böhm

doi:10.25365/thesis.65002

You are here: University of Vienna PHAIDRA Detail o:1389810

Title (eng)

Quantitative convergence estimates of deterministic and stochastic methods for optimization and minimax problems

Parallel title (deu)

Konvergenzraten deterministischer und stochastischer Verfahren für Sattelpunkt- und Optimierungsprobleme

Author

Axel Böhm

Advisor

Radu Ioan Bot

Co-Advisor

Peter Richtárik

Assessor

Simon Lacoste-Julien

Hermann Schichl

Abstract (deu)

Nichtdifferenzierbarkeit spielt eine kritische Rolle in vielen verschiedenen Optimierungsproblemen. Manchmal wird sie künstlich hinzugefügt um wünschenswerte Eigenschaften in der Lösung zu erzeugen. Dies ist der Fall bei den Problemen denen wir uns in der ersten Hälfte dieser Arbeit widmen. Obwohl die involvierten nichtglatten Funktionen typischerweise simpel sind, entsteht die Schwierigkeit dadurch, dass sie mit einem anderen Operator hintereinander ausgeführt werden. Wir betrachten solche Probleme in einer klassischen konvexen Formulierung und stellen ein neues randomisiertes Verfahren vor, welches wir in numerischen Experimenten in der Bildverarbeitung und Matrixzerlegung testen. Zusätzlich stellen wir eine komplexere nichtkonvexe Version desselben Problems, gemeinsam mit einem neuen Verfahren, vor. In anderen Fällen hingegen, entsteht Nichtdifferenzierbarkeit ganz natürlich, beispielsweise dadurch, dass die Zielfunktion einem inneren Maximierungsproblem entstammt. Wir behandeln solche Sattelpunktprobleme in der zweiten Hälfte der Arbeit. Solche Formulierungen haben ihren Ursprung, unter anderem in Nullsummenspielen zweier konkurrierender Parteien. Wir hingegen legen besonderes Augenmerk auf Anwendungen im Maschinellen Lernen, insbesondere sogenannte generative adversarial networks (GANs). Im konvexen Fall stellen wir eine Modifikation von dem bekannten Verfahren von Tseng vor, während wir im Nichtkonvexen ein simples Gradientenverfahren analysieren. Im Allgemeinen konzentrieren wir uns auf Splitting-Methoden, die sich dadurch auszeichnen, dass die Nichtglattheit mittels des Proximalpunktoperators behandelt wird und aufwendige Subroutinen vermieden werden. Diese Verfahren erreichen zwar nicht immer die besten theoretischen Konvergenzraten, sind aber dennoch sehr beliebt aufgrund ihrer Einfachheit und Kompetitivität in praxisrelevanten Anwendungen.

Abstract (eng)

Nonsmoothness plays a critical role in many optimization problems. Sometimes it is put into the model purposely to induce desirable properties in the solution, most notably sparsity, as it is the case with the composite models we study in the first half of this thesis. Although, the used nonsmooth functions tend to be simple, difficulty arises through the composition with another operator. We study such problems in a classical convex setting by proposing a randomized method and testing it on numerical experiments in image denoising and deblurring as well as completely positive matrix factorization. Additionally we propose a more sophisticated nonconvex formulation together with a novel method including convergence analysis for this setting. In either case, our approach is heavily inspired by a smoothing strategy via the Moreau envelope. Other times the nonsmoothness originates naturally, for example due to the fact that the objective is derived from an auxiliary maximization problem. We study such minimax problems in the second half in a convex and nonconvex setting. While these types of problems also arise from two-player zero-sum games we emphasize applications in machine learning, in particular generative adversarial networks (GANs). In the convex setting we propose a modification of Tseng's method while for the nonconvex problem we prove novel convergence rates for the well established gradient descent ascent method (GDA). In general we focus on full splitting methods which aim to tackle the nonsmoothness via the proximal operator and avoid convoluted inner loops or the need for subproblems. Similarly, only first-order information and preferably even only stochastic estimators of the involved gradients. These methods do not always achieve the best theoretical convergence rates but are nevertheless highly popular due to their simplicity and because they also tend to be very competitive in practice. For all presented methods we provide a rigorous analysis in terms of convergence rates.

Keywords (eng)

minimaxsaddle point problemproximal gradient methodsstochastic gradient descentMoreau envelope

Keywords (deu)

SattelpunktproblemProximalpunktverfahrenstochastischer GradientMoreau Einhüllende

Subject (deu)

Angewandte Mathematik

Subject (deu)

Analysis: Sonstiges

Type (deu)

Dissertation

Persistent identifier

https://phaidra.univie.ac.at/o:1389810