Lê Nguyên Hoang: /* Neural networks and convexity */

2020-02-11T15:50:08Z

Neural networks and convexity

Lê Nguyên Hoang: Created page with "A convex set is one such that the segment between any two points of the set still belongs to the set. In other words, if $x,y \in S$ and $\lambda \in [0,1]$
2020-02-11T08:10:10Z

Created page with "A convex set is one such that the segment between any two points of the set still belongs to the set. In other words, if <math>x,y \in S</math> and <math>\lambda \in [0,1]</ma..."

New page
A convex set is one such that the segment between any two points of the set still belongs to the set. In other words, if <math>x,y \in S</math> and <math>\lambda \in [0,1]</math>, then <math>\lambda x + (1-\lambda) y \in S</math>. A convex function <math>f</math> is such that <math>f(\lambda x + (1-\lambda) y) \leq \lambda f(x) + (1-\lambda) f(y)</math></math>. Put differently, the image of the average is below the average of the images.

Convexity play a central role in optimization, because optimization of convex functions over convex sets can be done efficiently, for instance using (variants of) [[stochastic gradient descent|gradient descents]]. Yet, weirdly, though neural networks learning is not a convex problem, it has become the most successful approach for machine learning.

== Examples of convex optimization ==

MSE linear regression, cross-entropy, SVM, logistic regression.

== Neural networks and convexity ==

[http://papers.nips.cc/paper/8076-neural-tangent-kernel-convergence-and-generalization-in-neural-networks.pdf JHG][https://dblp.org/rec/bibtex/conf/nips/JacotHG18 18] showed that, in the infinite-width limit, neural networks could be regarded as a convex optimization of neural networks in the functional space [https://www.youtube.com/watch?v=7WmOSqcBDr0 ZettaBytes19].

On another note, the recent discovery of [[overfitting|double descent]] [https://openreview.net/pdf?id=Sy8gdB9xx ZBHRV][https://dblp.org/rec/bibtex/conf/iclr/ZhangBHRV17 17] [https://arxiv.org/pdf/1912.02292 NKBYBS][https://dblp.org/rec/bibtex/journals/corr/abs-1912-02292 19] suggest that overparameterized neural networks are better at learning and generalization. It may or may not be that the learning of overparameterized neural networks is "nearly convex". A conjecture that we might make is that any random initialization of a very overparameterized neural network is, with high probability, in a convex region where the loss function of the learning problem is convex, and where its minimum is zero.

Convexity - Revision history

Lê Nguyên Hoang: /* Neural networks and convexity */