Difference between revisions of "Convexity"
(Created page with "A convex set is one such that the segment between any two points of the set still belongs to the set. In other words, if <math>x,y \in S</math> and <math>\lambda \in [0,1]</ma...") |
|||
Line 11: | Line 11: | ||
[http://papers.nips.cc/paper/8076-neural-tangent-kernel-convergence-and-generalization-in-neural-networks.pdf JHG][https://dblp.org/rec/bibtex/conf/nips/JacotHG18 18] showed that, in the infinite-width limit, neural networks could be regarded as a convex optimization of neural networks in the functional space [https://www.youtube.com/watch?v=7WmOSqcBDr0 ZettaBytes19]. | [http://papers.nips.cc/paper/8076-neural-tangent-kernel-convergence-and-generalization-in-neural-networks.pdf JHG][https://dblp.org/rec/bibtex/conf/nips/JacotHG18 18] showed that, in the infinite-width limit, neural networks could be regarded as a convex optimization of neural networks in the functional space [https://www.youtube.com/watch?v=7WmOSqcBDr0 ZettaBytes19]. | ||
− | On another note, the recent discovery of [[overfitting|double descent]] [https://openreview.net/pdf?id=Sy8gdB9xx ZBHRV][https://dblp.org/rec/bibtex/conf/iclr/ZhangBHRV17 17] [https://arxiv.org/pdf/1912.02292 NKBYBS][https://dblp.org/rec/bibtex/journals/corr/abs-1912-02292 19] suggest that overparameterized neural networks are better at learning and generalization. | + | On another note, the recent discovery of [[overfitting|double descent]] [https://openreview.net/pdf?id=Sy8gdB9xx ZBHRV][https://dblp.org/rec/bibtex/conf/iclr/ZhangBHRV17 17] [https://arxiv.org/pdf/1912.02292 NKBYBS][https://dblp.org/rec/bibtex/journals/corr/abs-1912-02292 19] suggest that overparameterized neural networks are better at learning and generalization. The learning of overparameterized neural networks then seems "nearly convex" [https://arxiv.org/pdf/1811.08888 ZCZG][https://dblp.org/rec/bibtex/journals/corr/abs-1811-08888 18] [https://openreview.net/pdf?id=BylIciRcYQ ZYZLT][https://dblp.org/rec/bibtex/conf/iclr/ZhouYZLT19 19] [http://proceedings.mlr.press/v97/allen-zhu19a/allen-zhu19a.pdf ZLS][https://dblp.org/rec/bibtex/conf/icml/Allen-ZhuLS19 19] [http://proceedings.mlr.press/v97/du19c/du19c.pdf DLLWZ][https://dblp.org/rec/bibtex/conf/icml/DuLL0Z19 19]. |
Latest revision as of 16:50, 11 February 2020
A convex set is one such that the segment between any two points of the set still belongs to the set. In other words, if [math]x,y \in S[/math] and [math]\lambda \in [0,1][/math], then [math]\lambda x + (1-\lambda) y \in S[/math]. A convex function [math]f[/math] is such that [math]f(\lambda x + (1-\lambda) y) \leq \lambda f(x) + (1-\lambda) f(y)[/math]</math>. Put differently, the image of the average is below the average of the images.
Convexity play a central role in optimization, because optimization of convex functions over convex sets can be done efficiently, for instance using (variants of) gradient descents. Yet, weirdly, though neural networks learning is not a convex problem, it has become the most successful approach for machine learning.
Examples of convex optimization
MSE linear regression, cross-entropy, SVM, logistic regression.
Neural networks and convexity
JHG18 showed that, in the infinite-width limit, neural networks could be regarded as a convex optimization of neural networks in the functional space ZettaBytes19.
On another note, the recent discovery of double descent ZBHRV17 NKBYBS19 suggest that overparameterized neural networks are better at learning and generalization. The learning of overparameterized neural networks then seems "nearly convex" ZCZG18 ZYZLT19 ZLS19 DLLWZ19.