<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://robustlybeneficial.org/wiki/index.php?action=history&amp;feed=atom&amp;title=Convexity</id>
	<title>Convexity - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://robustlybeneficial.org/wiki/index.php?action=history&amp;feed=atom&amp;title=Convexity"/>
	<link rel="alternate" type="text/html" href="https://robustlybeneficial.org/wiki/index.php?title=Convexity&amp;action=history"/>
	<updated>2026-04-28T13:44:37Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.34.0</generator>
	<entry>
		<id>https://robustlybeneficial.org/wiki/index.php?title=Convexity&amp;diff=204&amp;oldid=prev</id>
		<title>Lê Nguyên Hoang: /* Neural networks and convexity */</title>
		<link rel="alternate" type="text/html" href="https://robustlybeneficial.org/wiki/index.php?title=Convexity&amp;diff=204&amp;oldid=prev"/>
		<updated>2020-02-11T15:50:08Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Neural networks and convexity&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 15:50, 11 February 2020&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l11&quot; &gt;Line 11:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 11:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[http://papers.nips.cc/paper/8076-neural-tangent-kernel-convergence-and-generalization-in-neural-networks.pdf JHG][https://dblp.org/rec/bibtex/conf/nips/JacotHG18 18] showed that, in the infinite-width limit, neural networks could be regarded as a convex optimization of neural networks in the functional space [https://www.youtube.com/watch?v=7WmOSqcBDr0 ZettaBytes19].&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[http://papers.nips.cc/paper/8076-neural-tangent-kernel-convergence-and-generalization-in-neural-networks.pdf JHG][https://dblp.org/rec/bibtex/conf/nips/JacotHG18 18] showed that, in the infinite-width limit, neural networks could be regarded as a convex optimization of neural networks in the functional space [https://www.youtube.com/watch?v=7WmOSqcBDr0 ZettaBytes19].&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;On another note, the recent discovery of [[overfitting|double descent]] [https://openreview.net/pdf?id=Sy8gdB9xx ZBHRV][https://dblp.org/rec/bibtex/conf/iclr/ZhangBHRV17 17] [https://arxiv.org/pdf/1912.02292 NKBYBS][https://dblp.org/rec/bibtex/journals/corr/abs-1912-02292 19] suggest that overparameterized neural networks are better at learning and generalization. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;It may or may not be that the &lt;/del&gt;learning of overparameterized neural networks &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;is &lt;/del&gt;&amp;quot;nearly convex&amp;quot;. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;A conjecture that we might make is that any random initialization of a very overparameterized neural network is, with high probability, in a convex region where the loss function of the learning problem is convex, and where its minimum is zero&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;On another note, the recent discovery of [[overfitting|double descent]] [https://openreview.net/pdf?id=Sy8gdB9xx ZBHRV][https://dblp.org/rec/bibtex/conf/iclr/ZhangBHRV17 17] [https://arxiv.org/pdf/1912.02292 NKBYBS][https://dblp.org/rec/bibtex/journals/corr/abs-1912-02292 19] suggest that overparameterized neural networks are better at learning and generalization. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;The &lt;/ins&gt;learning of overparameterized neural networks &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;then seems &lt;/ins&gt;&amp;quot;nearly convex&amp;quot; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[https://arxiv&lt;/ins&gt;.&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;org/pdf/1811.08888 ZCZG][https://dblp.org/rec/bibtex/journals/corr/abs-1811-08888 18] [https://openreview.net/pdf?id=BylIciRcYQ ZYZLT][https://dblp.org/rec/bibtex/conf/iclr/ZhouYZLT19 19] [http://proceedings.mlr.press/v97/allen-zhu19a/allen-zhu19a.pdf ZLS][https://dblp.org/rec/bibtex/conf/icml/Allen-ZhuLS19 19] [http://proceedings.mlr.press/v97/du19c/du19c.pdf DLLWZ][https://dblp.org/rec/bibtex/conf/icml/DuLL0Z19 19]&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Lê Nguyên Hoang</name></author>
		
	</entry>
	<entry>
		<id>https://robustlybeneficial.org/wiki/index.php?title=Convexity&amp;diff=202&amp;oldid=prev</id>
		<title>Lê Nguyên Hoang: Created page with &quot;A convex set is one such that the segment between any two points of the set still belongs to the set. In other words, if &lt;math&gt;x,y \in S&lt;/math&gt; and &lt;math&gt;\lambda \in [0,1]&lt;/ma...&quot;</title>
		<link rel="alternate" type="text/html" href="https://robustlybeneficial.org/wiki/index.php?title=Convexity&amp;diff=202&amp;oldid=prev"/>
		<updated>2020-02-11T08:10:10Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;A convex set is one such that the segment between any two points of the set still belongs to the set. In other words, if &amp;lt;math&amp;gt;x,y \in S&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\lambda \in [0,1]&amp;lt;/ma...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;A convex set is one such that the segment between any two points of the set still belongs to the set. In other words, if &amp;lt;math&amp;gt;x,y \in S&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\lambda \in [0,1]&amp;lt;/math&amp;gt;, then &amp;lt;math&amp;gt;\lambda x + (1-\lambda) y \in S&amp;lt;/math&amp;gt;. A convex function &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; is such that &amp;lt;math&amp;gt;f(\lambda x + (1-\lambda) y) \leq \lambda f(x) + (1-\lambda) f(y)&amp;lt;/math&amp;gt;&amp;lt;/math&amp;gt;. Put differently, the image of the average is below the average of the images.&lt;br /&gt;
&lt;br /&gt;
Convexity play a central role in optimization, because optimization of convex functions over convex sets can be done efficiently, for instance using (variants of) [[stochastic gradient descent|gradient descents]]. Yet, weirdly, though neural networks learning is not a convex problem, it has become the most successful approach for machine learning.&lt;br /&gt;
&lt;br /&gt;
== Examples of convex optimization ==&lt;br /&gt;
&lt;br /&gt;
MSE linear regression, cross-entropy, SVM, logistic regression.&lt;br /&gt;
&lt;br /&gt;
== Neural networks and convexity ==&lt;br /&gt;
&lt;br /&gt;
[http://papers.nips.cc/paper/8076-neural-tangent-kernel-convergence-and-generalization-in-neural-networks.pdf JHG][https://dblp.org/rec/bibtex/conf/nips/JacotHG18 18] showed that, in the infinite-width limit, neural networks could be regarded as a convex optimization of neural networks in the functional space [https://www.youtube.com/watch?v=7WmOSqcBDr0 ZettaBytes19].&lt;br /&gt;
&lt;br /&gt;
On another note, the recent discovery of [[overfitting|double descent]] [https://openreview.net/pdf?id=Sy8gdB9xx ZBHRV][https://dblp.org/rec/bibtex/conf/iclr/ZhangBHRV17 17] [https://arxiv.org/pdf/1912.02292 NKBYBS][https://dblp.org/rec/bibtex/journals/corr/abs-1912-02292 19] suggest that overparameterized neural networks are better at learning and generalization. It may or may not be that the learning of overparameterized neural networks is &amp;quot;nearly convex&amp;quot;. A conjecture that we might make is that any random initialization of a very overparameterized neural network is, with high probability, in a convex region where the loss function of the learning problem is convex, and where its minimum is zero.&lt;/div&gt;</summary>
		<author><name>Lê Nguyên Hoang</name></author>
		
	</entry>
</feed>