Open in Colab

Week 10

Exercise 1

Exercise

Consider the following convex problem

\[\begin{split}\begin{array}{ll} \mbox{minimize} & f_0(x)\\ \mbox{subject to} & f_i(x)\leq0\,,\quad i=1,\ldots,m\,. \end{array}\end{split}\]

where all \(f_i\) are differentiable. Assume that \(x^*\in\mathbb{R}^n\) and \(\lambda^*\in\mathbb{R}^m\) satisfy the KKT conditions. Prove by using properties of convex functions and the KKT conditions that

\[\nabla f_0(x^*)^T(x-x^*)\geq0\]

for all feasible \(x\). (We have seen a similar result when \(x^*\) is the global minimum. Here you need to prove it for a point that satisfies the KKT conditions.)

Hint: Show that \((x^*,\lambda^*)\) satisfies

\[\sum_{i=1}^m \lambda_i^*(f_i(x^*)+\nabla f_i(x^*)^\top (x-x^*)) = -\nabla f_0(x^*)^\top (x-x^*)\]

for any \(x\), and use feasibility of \(x\) to prove that the left-hand side is negative.

Since \(x\) is feasible we have \(f_i(x)\leq 0\), then by convexity of \(f_i\):

\[f_i(x^*)+\nabla f_i(x^*)^\top (x-x^*) \leq f_i(x)\leq 0\]

In addition to primal feasibility, the KKT conditions tell us that the dual is feasible: \(\lambda^*\geq 0\), the Lagrangian gradient vanishes: \(\nabla L(x^*,\lambda^*)=0\), and the slackness condition holds: \(\sum_i \lambda^*_i f_i(x^*)=0\). Now we consider the sum

\[0 \geq \sum_{i=1}^m \lambda_i^*(f_i(x^*)+\nabla f_i(x^*)^\top (x-x^*))\]

Using slackness, the first term vanishes. Now note that

\[\sum_i\lambda_i\nabla f_i(x^*)^\top (x-x^*)) = [\nabla L(x^*,\lambda^*)-\nabla f_0(x^*)]^\top (x-x^*)\]

Since the Lagrangian gradient vanishes, we therefore have

\[0\geq -\nabla f_0(x^*)^\top (x-x^*),\]

as required.

Exercise 2: Inequalities for steepest descent

Exercise

In this exercise we will derive some inequalities that will be important for the analysis of gradient descent.

Recall that for \(\beta>0\) a function \(f\colon \mathbb R^n\to \mathbb R\) is called \(\beta\)-smooth if \(f\) is differentiable and for all \(x,y\in \mathbb R^n\) we have the following inequality:

\[\| \nabla f(x) - \nabla f(y)\|\leq \beta\|x-y\|\]

(in other words, the gradient is \(\beta\)-Lipschitz).

Exercise 2a)

Exercise

Let \(f\) be \(\beta\)-smooth, show that for all \(x,y\in \mathbb R^n\) we have

\[|f(x)-f(y)-\nabla f(y)^\top (x-y)|\leq \frac\beta2\|x-y\|^2\]

Hint: Write \(f(x)-f(y)\) as

\[\int_0^1 \frac{d}{dt} f(y+t(x-y))\,\mathrm dt= \int_0^1\nabla f(y+t(x-y))^\top (x-y)\,\mathrm dt,\]

Use this to write the entire expression under the absolute value signs as an integral over an inner product \(\int_0^1 a(t)^\top b(t)\,\mathrm dt\), and apply Cauchy-Schwarz to this inner product.

Remark: If we also assume that \(f\) is convex, then we can drop the absolute value sign, since this expression is always positive for convex functions.

As in the hint, we write

\[\begin{split}\begin{align} |f(x)-f(y)-\nabla f(y)^\top (x-y)| &= \left|\int_0^1 \nabla f(y+t(x-y))^\top(x-y)-\nabla f(y)^\top (x-y)\,\mathrm dt\right|\\ &= \left|\int_0^1 \left\langle \nabla f(y+t(x-y))-\nabla f(y),\,x-y\right\rangle\,\mathrm dt\right|\\ &\leq \int_0^1 \|\nabla f(y+t(x-y))-\nabla f(y)\|\cdot\|x-y\|\,\mathrm dt \end{align}\end{split}\]

Now the left term can be bounded using \(\beta\)-smoothnes by \(\|\nabla f(y+t(x-y))-\nabla f(y)\|\leq t\beta \|(x-y)\|\), giving

\[\begin{split}\begin{align} |f(x)-f(y)-\nabla f(y)^\top (x-y)| &\leq \int t\beta \|x-y\|^2\,\mathrm dt\\ &= \frac\beta2\|x-y\|^2 \end{align}\end{split}\]

Exercise 2b)

Exercise

Assume \(f\) is convex and \(\beta\)-smooth. Show that for all \(x,y\in\mathbb R^n\) we have

\[f(x)-f(y)\leq \nabla f(x)^\top (x-y)-\frac1{2\beta} \|\nabla f(y)-\nabla f(x)\|^2\]

Hint: define \(z=y-\frac1\beta (\nabla f(y)-\nabla f(x))\) and consider

\[f(x)-f(y) = (f(x)-f(z))+(f(z)-f(y)).\]

Use convexity to bound the first term, and use the inequality of 2a) to bound the second term.

Remark: We only need \(\beta\)-smoothness here to use the inequality of 2a), so if \(f\) is convex and satisfies the inequality of 2a), it also satisfies the inequality of this exercise. This observation will be useful in 2d).

As suggested by the hint, we have by convexity that

\[f(x)-f(z)\leq \nabla f(x)^\top (x-z)\]

Then by 2a) we have that

\[f(z)-f(y)\leq \nabla f(y)^\top (z-y) +\frac{\beta}{2}\|z-y\|^2\]

Since \(z=y-\frac1\beta (\nabla f(y)-\nabla f(x))\) we have \(\frac\beta2\|z-y\|^2=\frac{1}{2\beta}\|\nabla f(x)-\nabla f(y)\|^2\). Bringing this together, we obtain

\[\begin{split}\begin{align} f(x)-f(y)&\leq \nabla f(x)^\top (x-z) +\nabla f(y)^\top (z-y)+\frac{1}{2\beta}\|\nabla f(x)-\nabla f(y)\|^2\\ &= \nabla f(x)^\top (x-y) +(\nabla f(x)-\nabla f(y))^\top (y-z)\\&\qquad\qquad+\frac{1}{2\beta}\|\nabla f(x)-\nabla f(y)\|^2\\ &= \nabla f(x)^\top (x-y) +(\nabla f(x)-\nabla f(y))^\top \frac1\beta(\nabla f(y)-\nabla f(x))\\&\qquad\qquad+\frac{1}{2\beta}\|\nabla f(x)-\nabla f(y)\|^2\\ &= \nabla f(x)^\top (x-y)- \frac{1}{2\beta}\|\nabla f(x)-\nabla f(y)\|^2 \end{align}\end{split}\]

Exercise 2c)

Exercise

Prove the converse of 2b), that is, prove that if \(f:\mathbb R^n\to \mathbb R\) is differentiable and if for all \(x,y\in\mathbb R^n\) we have

\[f(x)-f(y)\leq \nabla f(x)^\top (x-y)-\frac1{2\beta} \|\nabla f(y)-\nabla f(x)\|^2\]

then \(f\) is convex and \(\beta\)-smooth.

Hint: To show \(\beta\)-smoothenss, use add the inequality of 2b) for \(x,y\in\mathbb R^n\) and for \(y,x\in \mathbb R^n\), to show the stronger result that

\[\|\nabla f(y)-\nabla f(x)\|^2\leq \beta (\nabla f(x)-\nabla f(y))^\top (x-y)\]

Convexity is obtained by ignoring the second term in the inequality above.

For \(\beta\)-smoothness we add the bound above for \((x,y)\) and \((y,x)\) to obtain

\[0\leq (\nabla f(x)-\nabla f(y))^\top (x-y)-\frac{1}{\beta}\|\nabla f(y)-\nabla f(x)\|^2\]

Then by Cauchy-Schwarz we obtain

\[\frac1\beta\|\nabla f(y)-\nabla f(x)\|^2\leq (\nabla f(x)-\nabla f(y))^\top (x-y)\leq \|\nabla f(y)-\nabla f(x)\|\|y-x\|\]

Giving \(\beta\)-smoothness:

\[\|\nabla f(y)-\nabla f(x)\|\leq \beta\|y-x\|\]

Exercise 2d)

Exercise

Suppose that \(f\colon \mathbb R^n\to \mathbb R\) is \(\alpha\)-strongly convex, and \(\beta\)-smooth. Show that for all \(x,y\in\mathbb R^n\) we have

\[(\nabla f(x)-\nabla f(y))^\top (x-y) \geq \frac{\alpha\beta}{\beta+\alpha}\|x-y\|^2 +\frac{1}{\beta+\alpha}\|\nabla f(x) - \nabla f(y)\|^2\]

This result is useful for improving an easier convergence result of gradient descent.

Hint: Use \(\alpha\)-strong convexity to show that \(\phi(x):=f(x)-\frac\alpha2 \|x\|^2\) is convex and \((\beta-\alpha)\)-smooth. To show \((\beta-\alpha)\)-smoothness, rewrite the inequality of 2a) in terms of \(\phi\). Use this to show that \(\phi\) satisfies the inequality of 2c), and is hence \((\beta-\alpha)\)-smooth. Then finally apply the stronger inequality mentioned in 2c) to \(\phi\) to obtain the required result after some algebraic manipulation.

As per the hint, we consider \(\phi(x) = f(x)-\frac\alpha2\|x\|^2\). Recall that \(f\) is \(\alpha\)-strongly convex if

\[f(x)-f(y) \leq \nabla f(x)^\top (x-y) -\frac\alpha2\|x-y\|^2\]

Therefore \(\phi\) is convex precisely if \(f\) is \(\alpha\)-strictly convex. Now to show \(\phi\) is \((\beta-\alpha)\)-smooth, we use the fact that \(f\) is \(\beta\) smooth, and the result of 2a):

\[|f(x)-f(y)-\nabla f(y)^\top (x-y)|\leq \frac\beta2\|x-y\|^2\]

By convexity of \(f\) we can drop the absolute value signs in this inequality. Now replacing \(f(x)\) by \(\phi(x)+\frac\alpha2\|x\|^2\) and \(\nabla f(x)\) by \(\nabla \phi(x)+\alpha x\), we get

\[\begin{split}\begin{align} \frac\beta2\|x-y\|^2&\geq \phi(x)-\phi(y)+\frac\alpha2(\|x\|^2-\|y\|^2)-\nabla \phi(y)^\top (x-y)-\alpha y^\top (x-y)\\ &=\phi(x)-\phi(y)+\frac\alpha2\left(\|x\|^2+\|y\|^2+2y^\top x\right)-\nabla \phi(y)^\top(x-y)\\ &=\phi(x)-\phi(y)-\nabla \phi(y)^\top(x-y)+\frac\alpha2\|x-y\|^2 \end{align}\end{split}\]

Therefore

\[\phi(x)-\phi(y)-\nabla \phi(y)^\top (x-y)\leq \frac{\beta-\alpha}{2}\|x-y\|^2\]

By the remark of 2b), This is all we need to prove that \(\phi\) satisfies the inequality of 2b), and hence by extension, that it is \((\beta-\alpha)\)-smooth. Then by 2c) we have

\[\|\nabla \phi(y)-\nabla \phi(x)\|^2\leq (\beta-\alpha) (\nabla \phi(x)-\nabla \phi(y))^\top (x-y)\]

Rewriting this back in terms of \(f\) we get

\[\|\nabla f(y)-\nabla f(x)+\alpha(x-y)\|^2\leq (\beta-\alpha)(\nabla f(x)-\nabla f(y) +\alpha(y-x))^\top (x-y)\]

Hence

\[\|\nabla f(y)-\nabla f(x)+\alpha(x-y)\|^2 +\alpha(\beta-\alpha)\|x-y\|^2\leq (\beta-\alpha) (\nabla f(x)-\nabla f(y))^\top (x-y)\]

Now we expand the first term

\[\begin{split}\begin{align} \|\nabla f(y)-\nabla f(x)\|^2+(\alpha^2+\alpha(\beta-\alpha))&\|x-y\|^2-2\alpha (\nabla f(x)-\nabla f(y))^\top (x-y)\\ &\leq (\beta-\alpha) (\nabla f(x)-\nabla f(y))^\top (x-y) \end{align}\end{split}\]

Thenn finally, this gives

\[(\nabla f(x)-\nabla f(y))^\top (x-y) \geq \frac{1}{\alpha+\beta}\|\nabla f(y)-\nabla f(x)\|^2+\frac{\alpha\beta}{\alpha+\beta}\|x-y\|^2\]

as required.

Exercise 3: Boolean least squares

Exercise

Let \(A\in \mathbb R^{m\times n}\) and \(b\in \mathbb R^m\). We consider the Boolean least squares problem

\[\begin{split}\begin{array}{ll} \text{minimize} & \|Ax-b\|^2\\ \text{subject to} & x_i\in \{-1,\,1\}, \quad i=1,\dots,n \end{array}\end{split}\]

This is not a convex problem, and we thus want to relax it to a convex problem giving a useful lower bound.

Exercise 3a)

Exercise

Show that the Boolean least squares problem is equivalent to

\[\begin{split}\begin{array}{ll} \text{minimize} & \operatorname{tr}(A^\top A X)-2b^\top Ax+b^\top b \\ \text{subject to} & X = xx^\top\\ & X_{ii} = 1,\quad i=1,\dots,n \end{array}\end{split}\]

Where we consider the minimization problem with variables \(x\in \mathbb R^n\) and \(X\in\mathbb S(n)\).

We expand the objective of the Boolean least squares problem,

\[\|Ax-b\|^2 = x^\top A^\top Ax-2b^\top Ax+b^\top b\]

Then

\[x^\top A^\top A x = \operatorname{tr}(x^\top A^\top A x)=\operatorname{tr}( A^\top A xx^\top)\]

Under the constraint \(X=xx^\top\) we thus get an equivalence of the objectives of the two problems. Since \((xx^\top)_{ii}=x_i^2\), and \(x_i^2=1\) if and only if \(x_i\in\{-1,1\}\), we conclude the problems are equivalent.

Exercise 3b)

Exercise

We want to write the objective function of 3a) as an SDP objective of form \(\operatorname{tr}(BY)\). To this end, let

\[\begin{split}Y = \begin{pmatrix} X & x \\ x^\top & 1 \end{pmatrix},\qquad B=\begin{pmatrix}C&d\\d^\top &\alpha\end{pmatrix}\end{split}\]

Find the symmetric block matrix \(B\) such that

\[\operatorname{tr}(A^\top A X)-2b^\top Ax+b^\top b = \operatorname{tr}(BY)\]

We compute

\[\begin{split}BY = \begin{pmatrix}X & x \\ x^\top & 1\end{pmatrix}\begin{pmatrix}C&d\\d^\top &\alpha\end{pmatrix}= \begin{pmatrix} CX+d^\top x & cx+d \\ d^\top X+\alpha x^\top & d^\top x+\alpha \end{pmatrix}\end{split}\]

Therefore

\[\operatorname{tr}(BY) = \operatorname{tr}(CX) + 2d^\top x+ \alpha\]

giving

\[C = A^\top A,\quad d=-b^\top A,\quad \alpha=b^\top b\]

Exercise 3c)

Exercise

Let \(B\) be as in the previous exercise. Show that the following SDP is a convex relaxation of the Boolean least squares problem, i.e. the solution gives a lower bound to the Boolean least squares problem:

\[\begin{split}\begin{array}{ll} \text{minimize} & \operatorname{tr}(BY) \\ \text{subject to} & Y\succeq0\\ & Y_{ii}=1 \end{array}\end{split}\]

If we set

\[\begin{split}Y = \begin{pmatrix}xx^\top & x\\x^\top & 1\end{pmatrix}\end{split}\]

Then the objective reduces to that of the Boolean least squares problem on the feasible set by 3a) and 3b). Furthermore if \(x\) is feasible for the Boolean least squares problem, then \(x_i^2 = (xx^\top)_{ii}=1\) and hence \(Y_{ii}=1\). Furthermore \(Y\) is positive semidefinite. Indeed, if \(\xi=(y,\alpha)\) with \(y\in \mathbb R^n\) and \(\alpha\in \mathbb R\) then \(\xi^\top Y\xi = (y^\top x+\alpha)^2\geq 0\). Therefore the feasible set for the SDP is larger than that for the Boolean least squares problem, and the SDP is thus a relaxation.