A Mathematical Model Analysis of Optimization Algorithms in Deep Learning
SAMUEL OKON ESSANG
*
Department of Mathematics and Computer Science, Arthur Jarvis University, Akpabuyo, Nigeria.
DENIS UNDIUKEYE ASHISHIE
Department of Computer Science, University of Calabar, Nigeria.
DAVID OBOBOHO EGETE
Department of Computer Science, University of Calabar, Nigeria.
JOHN ADINYA ODEY
Department of Computer Science, University of Calabar, Nigeria.
BASSEY IGBO ELE
Department of Computer Science, University of Calabar, Nigeria.
AUGUSTINE OGBAJI OTOBI
Department of Computer Science, University of Calabar, Nigeria.
JACKSON EFIONG ANTE
Department of Mathematics, Topfaith University, Mkpatak, Nigeria.
MARTIN OMINI ARIKPO
University of Calabar, Nigeria and University of Calabar, Federal Polytechnic UGEP, Nigeria.
KOMOMMO WILLIE IWARA
College of Engineering and Computing, Hillside University of Science and Technology (HUST), Okemesi, Ekiti State, Nigeria.
ANIETIE OKPAN CLEOPAS
Department of Computer Science, University of Calabar, Nigeria.
BENEDICT ISEROM ITA
Department of Chemistry, University of Calabar, Calabar, Nigeria.
SYLVIA ADAOBI AKPORTUZOR
Department of Mathematics and Computer Science, Arthur Jarvis University, Akpabuyo, Nigeria.
OLAMIDE KOLAWOLE MICHAEL
Department of Mathematics and Computer Science, Arthur Jarvis University, Akpabuyo, Nigeria.
RAPHAEL DOMINIC EFFIONG
Department of Mathematics, University of Calabar, Nigeria.
*Author to whom correspondence should be addressed.
Abstract
This paper presents a rigorous mathematical analysis of optimization algorithms central to deep learning, including Gradient Descent (GD), Stochastic Gradient Descent (SGD), Momentum, Adam, and AMSGrad. We compare and discuss the update rules for each algorithm, delving into their underlying mathematical techniques such as Taylor expansions for approximating loss functions and gradients, and the theory of dynamical systems for understanding acceleration properties. We prove their convergence properties under standard assumptions, including convexity, smoothness (Lipschitz continuity of gradients), and strong convexity. Furthermore, we analyze their rates of convergence for various scenarios, such as O(1/t) for convex and smooth functions in GD, and O(1/√t) for stochastic methods in non-convex settings. We also consider the impact of bounded gradients in stochastic settings and the use ofm Lyapunov functions for proving convergence. Through this analysis, we aim to bridge the gap between theory and practice, offering insights into the design and application of optimization algorithms in deep learning.
Keywords: Optimization algorithms, deep learning, gradient descent, stochastic gradient descent, momentum, adam, AMSGrad, convergence analysis, convexity, smoothness