Open in Colab

Week 4

Exercise 1

Exercise

Let \(g:\mathbb{R}\rightarrow \mathbb{R}\) be a convex, monotonically non-decreasing function and let \(f:C\subset\mathbb{R}^n\rightarrow \mathbb{R}\) be a convex function defined on the convex set \(C\subset \mathbb R^n\).

Exercise 1a)

Exercise

Show that the composition \(h(x)=g(f(x))\) is a convex function.

Recall that \(g\) non-decreasing means that if \(x\leq y\) then \(g(x)\leq g(y)\).

Let \(x,y\in C\) and consider \(\lambda\in[0,1]\). By convexity of \(f\):

\[f(\lambda x+(1 -\lambda )y)\leq \lambda f(x)+(1-\lambda)f(y).\]

Since \(g\) is non-decreasing:

\[\begin{split}\begin{align} g\left(f(\lambda x+(1 -\lambda )y)\right) &\leq g\left(\lambda f(x)+(1-\lambda)f(y)\right)\\ &\leq \lambda g(f(x))+(1-\lambda)g(f(y)), \end{align}\end{split}\]

where in the last step we used the convexity of \(g\). This tells us precisely that \(h=g\circ f\) is convex.

Exercise 1b)

Exercise

Use the previous result to show that the function

\[h\colon \mathbb R^n\to \mathbb R,\qquad x\mapsto \exp\left(\beta x^TQx\right),\]

where \(\beta>0\) and \(Q\) symmetric positive definite matrix, is a convex function

Since \(Q\) is symmetric positive definite so is \(\beta Q\), and hence the map \(x\mapsto \beta x^T Q x\) is convex. Not that \(\exp\) is strictly increasing, and hence in particular non-decreasing. Therefore by a) we have that \(h\) is convex.

Exercise 1c)

Exercise

If we ask \(g\) to be just convex the result of a) could be no longer valid. Find an example of two convex functions \(g, f\) such that their composition is not convex.

Take any function \(f\) which is convex, but not concave, e.g. \(f(x) = x^2\). Then pick \(g(x) = -x\). We then have that \(h=g\circ f\) is concave but not convex.

Exercise 2

Exercise

Let \(\alpha_1,\alpha_2,\ldots,\alpha_n>0\) be scalars such that \(\sum_{i=1}^n\alpha_i=1\).

Show that for all \((x_1,\dots,x_n)\in \mathbb R^n\) we have the following inequality:

\[x_1^{\alpha_1}x_2^{\alpha_2}\cdots x_n^{\alpha_n}\leq \alpha_1 x_1+\alpha_2 x_2+\ldots+\alpha_n x_n\,.\]

Hint: Use \(\log\).

If \(x_i=0\) for some \(i\), then the inequality is trivial. Assume therefore that \(x_i\neq 0\) for all \(i\).

Recall that \(\log\) is concave, therefore:

\[\begin{split}\begin{align} \log\left(\sum_{i=1}^n\alpha_i x_i\right)&\geq \sum_{i=1}^n\alpha_i \log(x_i)\\ &=\sum_{i=1}^n\log(x_i^{\alpha_i})\\ &=\log\left(\prod_{i=1}^n x_i^{\alpha_i}\right) \end{align}\end{split}\]

Since \(\exp\) is monotonically increasing, we can apply it on both sides of the inequality to obtain the desired result.

Exercise 3

Exercise

Let \(C\subset\mathbb{R}^n\) be closed and convex, and let \(\left\|\cdot\right\|\) be the Euclidean norm.

In Homework 1.3a) you proved that for every \(x\in \mathbb{R}^n\) there exists a unique point \(z\in C\) that minimizes the distance \(\left\|x-z\right\|\). Let us define

\[\mathcal P_C\colon \mathbb R^n\to C,\qquad\mathcal P_C(x)=\operatorname{argmin}_{z\in C}\left\|z-x\right\|.\]

That is, \(\mathcal P_C\) is the projection onto \(C\), giving the closest approximation of \(x\) by any element in \(C\).

Exercise 3a)

Exercise

Show that for \(x\in\mathbb{R}^n\) a point \(z\in C\) satisfies \(z = \mathcal P_C(x)\) if and only if

\[(y-z)^T(x-z)\leq 0\,,\quad \forall y\in C\,;\]

Notice that the function \(f(z)=\frac12\left\|z-x\right\|^2\), \(z\in C\) is convex. Then, from the sufficient and necessary optimality conditions for convex optimization, a point \(z\in C\) is a minimizer of \(f\) if and only if

\[\nabla f(z)^T(y-z)\geq 0\,,\quad \forall y\in C\,.\]

Since \(\nabla f(z)=(z-x)\), and \(f\) clearly has the same minimizer as \(z\mapsto \|z-x\|\), we obtain the result.

Exercise 3b)

Exercise

Show that \(\mathcal P_C\) is continuous and non-expansive. That is show that the following inequality holds:

\[\left\|\mathcal P_C(x)-\mathcal P_C(y)\right\|\leq \left\|x-y\right\|\,,\quad \forall x,y\in\mathbb{R}^n.\]

By the Cauchy-Schwarz inequality it is enough to show that

\[\|\mathcal P_C(x)-\mathcal P_C(Y)\|^2 \leq \left\langle\mathcal P_C(x)-\mathcal P_C(y),\,x-y \right\rangle\]

This is because CS inequality tells us that

\[\left\langle\mathcal P_C(x)-\mathcal P_C(y),\,x-y \right\rangle\leq \left\|\mathcal P_C(x)-\mathcal P_C(y)\right\|\left\|x-y\right\|.\]

Using the optimality condition of 3a) with \(x=x\), \(y=\mathcal P_C(y)\) and \(z=\mathcal P_C(x)\), we obtain

\[\left\langle\mathcal P_C(y)-\mathcal P_C(x),\,x-\mathcal P_C(x)\right\rangle\leq 0.\]

Reversing the roles of \(x\) and \(y\) we also get

\[\left\langle\mathcal P_C(x)-\mathcal P_C(y),\,y-\mathcal P_C(y) \right\rangle\leq 0.\]

Adding these two inequalities we get

\[\left\langle\mathcal P_C(x)-\mathcal P_C(y),\,-(x-\mathcal P_C(x))+y-\mathcal P_C(y)\right\rangle\leq 0.\]

We can rearange the terms of this inequality to obtain:

\[\|P_C(x)-\mathcal P_C(y)\|^2-\left\langle\mathcal P_C(x)-\mathcal P_C(y),\,x-y\right\rangle\leq 0.\]

By the remark at the start of the proof this gives the desired result.

Exercise 4

Exercise

Consider the function

\[f\colon S_+^n\to \mathbb R,\qquad f(X) = \log \det (X),\]

where \(S_+^n\) denotes the set of \(n\times n\) symmetric positive definite matrices. We want to show that \(f\) is concave by showing that for each \(X,Y\in S_+^n\) the following function is concave:

\[g\colon S_+^n\to \mathbb R,\qquad g(t) = \log \det (X+tY)\]

Exercise 4a)

Exercise

If \(X\) has eigenvalues \(\lambda_1,\dots,\lambda_n >0\), then show that

\[\log\det X = \sum_i\log \lambda_i\]

Let \(X = QDQ^{-1}\) be the eigenvalue decomposition of \(X\). We then have that

\[\begin{split}\begin{align} \log\det X &= \log \det QDQ^{-1} = \log \det(D)\\ &=\log\prod_i\lambda_i=\sum_i\log\lambda_i \end{align}\end{split}\]

Exercise 4b)

Exercise

Show that

\[\log \det (X+tY) = \log\det(X)+\log\det(I+tZ),\]

for some \(Z\in S_+^n\).

Hint: Conjugate \((X+tY)\) by \(I = X^{1/2}X^{-1/2}\), where \(X=X^{1/2}X^{1/2}\).

We have

\[\begin{split}\begin{align} \log \det (X+tY) &= \log\det(X+(X^{1/2}X^{-1/2})tY(X^{-1/2}X^{1/2}))\\ &= \log\det (X^{1/2}(I +t X^{-1/2}YX^{-1/2})X^{1/2})\\ &= \log\left(\det (X^{1/2})\det(I +t X^{-1/2}YX^{-1/2})\det(X^{1/2})\right)\\ &= \log \det(X) + \log\det(I +t X^{-1/2}YX^{-1/2}) \end{align}\end{split}\]

Note that if \(X=QDQ^T\) then \(X^{-1/2} = QD^{-1/2}Q^T\). so \(X^{-1/2}\) is symmetric. We thus have that for any nonzero \(x\in \mathbb R^n\):

\[x^T(X^{-1/2}YX^{-1/2})x = (X^{-1/2}x)^TY(X^{-1/2}x) > 0\]

Moreover since \(X^{-1/2}\) is symmetric, \(Z^T = Z\), and we conclude that \(Z= X^{-1/2}YX^{-1/2}\in S_+^n\) is symmetric positive definite.

Note that in general if \(X\) is symmetric and \(A\in S_+^n\), then \(XAX\in S_+^n\). If \(A,B\in S_+^n\) then \(AB\in S_+^n\) if and only if \(AB=BA\), so we couldn’t just take \(Z=YX^{-1}\)!

Exercise 4c)

Exercise

Let \(\mu_1,\dots,\mu_n\) be the eigenvalues of \(Z\). Show that the eigenvalues of \(I+tZ\) are given by \(1+t\mu_1,\dots,1+t\mu_n\).

If \(Z = QDQ^{-1}\) is the eigenvalue decomposition of \(Z\), then

\[I+tZ = Q(I+tD)Q^{-1}\]

Hence the eigenvalues of \(I+tZ\) are \(1+t\mu_1,\dots,1+t\mu_n\).

The crucial property here is that \(I\) and \(Z\) commute. In general if \(A,B\) symmetric and \(AB=BA\) commute, then we can simultaneously diagonalize them: \(A = UD_1U^T, \, B=UD_2 U^T\) with \(U\) orthogonal and \(D_i\) diagonal, so that \(A+B = U(D_1+D_2)U^T\). This means that if two symmetric matrices commute, then the eigenvalues of \(A+B\) are the sums of the respective eigenvalues.

Exercise 4d)

Exercise

Show that \(g''(t) < 0\) for all \(t\in \mathbb R\) and conclude that \(f=\log\det\) is concave.

We have that

\[g(t) = \sum_i\log(1+t\mu_i)+\text{Constant}\]

Hence:

\[g'(t) = \sum_i \frac{\mu_i}{1+t\mu_i},\]

and consequently,

\[g''(t) =\sum_i \frac{-\mu_i^2}{(t+t\mu_i)^2} < 0\]

Hence \(g(t)\) is concave. Since this holds for all \(X,Y\in S_+^n\) we also conclude that \(f=\log\det\) is concave.