# Week 8¶

## Exercise 1¶

Exercise

Derive the Lagrange dual of the following problem:

$\begin{split}\begin{array}{l l} \text{minimize} & \sum_i^n\phi(x_i)\\ \text{subject to} & Ax=b \end{array}\end{split}$

with $$x\in \mathbb R^n$$, $$A\in \mathbb R^{m \times n}$$, $$b\in \mathbb R^m$$, and $$\phi$$ is defined by a parameter $$c>0$$ as follows

$\phi\colon(-c,c)\to \mathbb R_+,\ x\mapsto \frac{|x|}{c-|x|}.$

Hint: The Lagrangian should decouple into $$n$$ seperate terms of form ($$u,y\in\mathbb R$$)

$f(u,y) = \frac{|u|}{c-|u|}+yu$

To compute $$\inf_uf(u,y)$$ consider the cases where the infimum is achieved for $$u<0$$, $$u=0$$ and $$u>0$$ seperately.

We plot the function $$\phi$$ for $$c=1$$ below. Near $$x=0$$ it looks like $$|x|$$, and near $$|x|=c$$ it has a vertical asymptote. This function can therefore be used as a penalty which promotes sparsity or robustness (like the usual $$|x|$$ penalty), and at the same time it forces the constraint $$|x|<c$$.

[1]:

import numpy as np
import matplotlib.pyplot as plt

def phi(u, c):
return np.abs(u) / (c - np.abs(u))

c = 1
epsilon = 1e-8
x = np.linspace(-c + epsilon, c - epsilon, 100)
y = phi(x, c)

plt.figure(figsize=(8, 4))
plt.ylim(0, 5)
plt.title("Plot of $\phi(x)$ for $c=1$")
plt.axvline(-c, c="k", ls="--")
plt.axvline(c, c="k", ls="--")
plt.plot(x, y);
`

We can write the equality constraint as

$(Ax-b)_j = \sum_{i=1}^n A_{ji}x_i -b_j = 0$

Introducing Lagrange multipliers $$\nu = (\nu_1,\dots,\nu_m)$$, the Lagrangian is then given by

\begin{split}\begin{align} L(x,\nu) &= \sum_{i=1}^n\phi(x_i) + \sum_{j=1}^m\nu_j\sum_{i=1}^n A_{ji}x_i -b_j\\ &= \sum_{i=1}^n\left(\phi(x_i) + x_i\left(a_i^\top \nu\right)\right) -b^\top \nu \end{align}\end{split}

where $$a_i = (A_{i1},\dots,A_{im})^\top$$ is the $$i$$th column of $$A$$. The dual function is then given by

\begin{align} g(\nu)&=\inf_x L(x,\nu) = -b^\top \nu + \sum_{i=1}^n \inf_{x_i} \phi(x_i)+x_i(a_i^\top \nu) \end{align}

To simplifty this expression we study the the function

$h(y) =\inf_u \frac{|u|}{c-|u|}+yu.$

If $$u> 0$$ the derivative of $$\phi(u)+yu$$ is given by

$\frac{\mathrm d}{\mathrm du}\frac{u}{c-u}+yu = \frac{c}{(c-u)^2} +y$

which is zero if $$(c-u)^2=-c/y$$, so we need to assume $$y<0$$. Then $$u=c-\sqrt{c/|y|}$$, and since $$u> 0$$ we also need $$\sqrt{c/|y|}> c$$ or, $$y< -1/c$$. Then $$h(y)$$ has value

\begin{split}\begin{align} &\frac{c-\sqrt{c/|y|}}{c-\left(c-\sqrt{c/|y|}\right)}-|y|c+|y|\sqrt{c/|y|}\\ &=\frac{c}{\sqrt{c/|y|}} -1 - |y|c+|y|\sqrt{c/|y|}\\ &=-1 - |y|c+2\sqrt{c|y|} = -\left(1-\sqrt{c|y|}\right)^2 \end{align}\end{split}

If $$u<0$$ we similarly derive that $$u=-(c-\sqrt{c/|y|})$$ and $$y> 1/c$$. In this case $$h(y)$$ has the same value as above.

Finally, there is the possibility that $$|y|\leq 1/c$$. In this case for $$u<0$$ the derivatives are always negative, and for $$u>0$$ the derivatives are always positive. By continuity, the infumum must be achieved at $$u=0$$ where $$h(y)=0$$. In summary we have

$\begin{split}h(y) = \inf_u \frac{|u|}{c-|u|}+yu = \left\{\begin{array}{ll} -(1-\sqrt{c|y|})^2,&|y|>1/c\\ 0 & |y|\leq 1/c \end{array} \right.\end{split}$

Then finally the dual problem is given by

$\begin{array}{ll} \min_{\nu} & -b^\top \nu + \sum_{i=1}^n h(a_i^\top \nu) \end{array}$

## Exercise 2¶

Exercise

We want to estimate the covariance matrix $$\Sigma\in S_{++}^n$$ of a multivariate normal distribution with mean $$\mu\in \mathbb R^n$$ from data. Given data $$\mathbf x=(x_1,\dots,x_N)\in \mathbb R^{n\times N}$$, this means we want to maximize the likelihood of $$\mathbf x$$ as a function of $$\Sigma$$. That is, we want to solve the following maximization problem:

$\begin{split}\begin{array}{l l} \text{maximize} & \prod_{i=1}^N (2\pi)^{-n/2}(\det \Sigma)^{-1/2} \exp \left( -\frac12 (x_i-\mu)^\top \Sigma^{-1} (x_i-\mu)\right)\\ \text{subject to} & \Sigma\succ 0 \end{array}\end{split}$

### Exercise 2a)¶

Exercise

Show that the problem above is equivalent to

$\begin{split}\begin{array}{l l} \text{minimize} & -\frac{N}{2} \log\det \Sigma^{-1} +\frac12 \langle A,\,\Sigma^{-1}\rangle \\ \text{subject to} & \Sigma\succ 0 \end{array}\end{split}$

Where $$A\in\mathbb S^n$$ is a symmetric matrix. Recall that we proved in week 4 that $$-\log\det$$ is a convex function, so this is a convex problem in the variable $$\Sigma^{-1}$$.

We take a logarithm and ignore the constant term to obtain the equivalent problem

$\begin{split}\begin{array}{l l} \text{maximize} & -\frac{N}{2} \log\det \Sigma -\frac12 \sum_{i=1}^n (x_i-\mu)^\top \Sigma^{-1}(x_i-\mu) \\ \text{subject to} & \Sigma\succ 0 \end{array}\end{split}$

If we define $$A = \sum_i (x_i-\mu)^\top (x_i-\mu) \in \mathbb S^n$$, and use $$\log\det \Sigma=-\log\det\Sigma^{-1}$$ then we obtain

$\begin{split}\begin{array}{l l} \text{maximize} & -\frac{N}{2} \log\det \Sigma^{-1} -\frac12 \langle A,\,\Sigma^{-1}\rangle \\ \text{subject to} & \Sigma\succ 0 \end{array}\end{split}$

Multipling the objective by $$-1$$ and minimizing instead of maximizing gives the required answer.

### Exercise 2b)¶

Exercise

Recall from exercise 4 of week 4 that for $$X\succeq 0$$ and $$Y$$ symmetric, we have

$\log \det (X+tY) = \log\det(X) + \log\det(I+tX^{-1/2}YX^{-1/2}).$

Using the Taylor series of $$\log(1+x)$$ and the fact that $$\det(I+tA) = 1+\operatorname{tr} (tA) + o(t^2)$$, show that the directional derivative of $$\log \det$$ satisfies

$\left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0}\log \det (X+tY) = \operatorname{tr}(X^{-1}Y)$

We have

\begin{split}\begin{align} \left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0}\log \det (X+tY) &= \left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0}\log\det(I+tX^{-1/2}YX^{-1/2})\\ &= \left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0} 1+\operatorname{tr} (tX^{-1/2}YX^{-1/2})+o(t^2)\\ &= \operatorname{tr} (X^{-1/2}YX^{-1/2})\\ &= \operatorname{tr} (X^{-1}Y) \end{align}\end{split}

here we used the fact that the trace is linear, and that it is cyclic: $$\operatorname{tr}(ABC)=\operatorname{tr}(CAB)$$.

### Exercise 2c)¶

Exercise

Show that $$(X+tY)^{-1} = X^{-1}(I+tYX^{-1})^{-1}$$. Then use the power series $$(1-x)^{-1} = \sum_{n=0}^\infty x^n$$ to show that

$\left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0}\operatorname{tr} (A(X+tY)^{-1}) = -\operatorname{tr}(X^{-1}AX^{-1}Y).$

We have that

$I = (X+tY)(X+tY)^{-1} = (I+tYX^{-1})X(X+tY)^{-1}$

Hence $$(I+tYX^{-1})^{-1} = X(X+tY)^{-1}$$, multiplying both sides by $$X^{-1}$$ gives the result. We therefore have

$(X+tY)^{-1} = X^{-1}(I+tYX^{-1})^{-1} = X^{-1}-tX^{-1}YX^{-1} +o(t^2)$

plugging this into the directional derivative, we obtain

\begin{split}\begin{align} \left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0}\operatorname{tr} (A(X+tY)^{-1})&=\left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0} \operatorname{tr}(AX^{-1}-tAX^{-1}YX^{-1} + o(t^2))\\ &= -\operatorname{tr}(AX^{-1}YX^{-1}) \\ &= -\operatorname{tr}(X^{-1}AX^{-1}Y) \end{align}\end{split}

again we used the fact that trace is linear and cyclic.

### Exercise 2d)¶

Exercise

Assume $$A\succ0$$. Solve the minimization problem

$\begin{split}\begin{array}{l l} \text{minimize} & \frac{N}{2} \log\det \Sigma +\frac12 \langle A,\,\Sigma^{-1}\rangle \\ \text{subject to} & \Sigma\succ 0 \end{array}\end{split}$

First solve the unconstrained problem by computing the directional derivatives and setting them to zero. Show that the resulting solution $$\Sigma^*$$ satisfies $$\Sigma^*\succ 0$$.

Denote the objective by $$f(\Sigma)$$. Using a) and b):

$\left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0} f(X+tY) = \frac{-N}{2}\operatorname{tr}(\Sigma^{-1}Y)+\frac12 \operatorname{tr}(\Sigma^{-1}A\Sigma^{-1}Y)$

If $$\Sigma$$ is optimal, then this directional derivative is zero for all $$Y$$. This happens if and only if

$-\frac{N}{2}\Sigma^{-1}+\frac12 \Sigma^{-1}A\Sigma^{-1}=0$

Factoring out the $$\Sigma^{-1}$$ on the left, we obtain

$\frac12\Sigma^{-1}\left(NI- A\Sigma^{-1}\right)=0$

Since $$\Sigma \succ 0$$, the matrix $$\Sigma^{-1}$$ is full rank. Hence, this means that $$NI- A\Sigma^{-1} = 0$$. We have as solution $$\Sigma^* = \frac1N A\succ 0$$.

### Exercise 2e)¶

Exercise

Assume $$A\succ 0$$. Write down the Lagrangian of

$\begin{split}\begin{array}{l l} \text{minimize} & \frac{N}{2} \log\det \Sigma +\frac12 \langle A,\,\Sigma^{-1}\rangle \\ \text{subject to} & \Sigma\succ 0 \end{array}\end{split}$

Show that the critical points of the Lagrangian can be obtained by solving a quadratic matrix equation:

$X^2+aX+Y = 0$

Hint: $$\Sigma$$ is a critical point of $$L(\Sigma,\Lambda)$$ if

$\left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0} L(\Sigma+tY,\Lambda) = 0,\qquad \forall Y\in\mathbb S^n$

This gives a matrix equation for the minimum. You can transform it to the required form by multiplying on the left and on the right by the symmetric square root of $$A$$, that is, $$A^{\frac12}$$.

We can rewrite $$\Sigma\succeq 0$$ as a term in the Lagrangian $$\operatorname{tr}(\Lambda \Sigma)$$, with $$\Lambda\in\mathbb S^n$$ a Lagrange multiplier. That is, the Lagrangian is

$L(\Sigma,\Lambda) = -\frac{N}{2} \log\det \Sigma +\frac12 \langle A,\,\Sigma^{-1}\rangle+\operatorname{tr}(\Lambda \Sigma)$

to compute the dual function $$g(\Lambda)$$ we need to minimize the Lagrangian. We have

$\left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0} L(\Sigma+tY,\Lambda) = \frac{-N}{2}\operatorname{tr}(\Sigma^{-1}Y)+\frac12 \operatorname{tr}(\Sigma^{-1}A\Sigma^{-1}Y)+\operatorname{tr}(\Lambda Y)$

This is zero for all $$Y$$ if and only if

$\frac{N}{2}\Sigma^{-1}-\frac12 \Sigma^{-1}A\Sigma^{-1}=\Lambda$

Multiplying left and right by $$A^{\frac12}$$ we obtain

$\frac{N}{2}\left(A^{\frac12}\Sigma^{-1}A^{\frac12}\right)-\frac12\left(A^{\frac12} \Sigma^{-1}A^{\frac12}\right)^2-A^{\frac12}\Lambda A^{\frac12}=0$

Multiplying by $$-2$$

$\left(A^{\frac12} \Sigma^{-1}A^{\frac12}\right)^2-N\left(A^{\frac12}\Sigma^{-1}A^{\frac12}\right)+2 A^{\frac12}\Lambda A^{\frac12}=0$

This is a quadratic matrix equation $$X^2+aX+Y=0$$ with

$a=-N,\quad X = A^{\frac12}\Sigma^{-1}A^{\frac12},\quad Y = 2 A^{\frac12}\Lambda A^{\frac12}$

### Exercise 2f)¶

Exercise

Let $$X,Y$$ be symmetric matrices, and consider the matrix equation $$X^2+aX+Y$$.

Like in the proof of the quadratic formula, we can complete the square. Find coefficients $$\alpha,\beta\in\mathbb R$$ such that

$(X+\alpha I)^2 = \beta I - Y$

Find the condition on $$Y$$ such that the quadratic matrix equation has a solution and give a formula for the solution.

We have

$\left(X+\frac a2 I\right)^2 = \frac {a^2}4I-Y$

Assuming $$\frac{a^2}{4}I-Y\succeq 0$$ we can take the positive matrix square root on both sides to obtain

$X+\frac a2 I = \left(\frac {a^2}4I-Y\right)^{\frac12}$

Then we obtain

$X = -\frac a2 I+\left(\frac {a^2}4I-Y\right)^{\frac12}$

We could now in principle compute the dual problem for the problem of 2d), but unfortunately the problem will not be very pretty.

## Exercise 3¶

Exercise

Consider the following convex problem

$\begin{split}\begin{array}{ll} \text{Minimize} & e^{-x}\\ \text{Subject to} & x^2 / y \leq 0 \end{array}\end{split}$

We consider this problem on the half-space domain $$\mathcal D = \{ (x,y)\in \mathbb R^2\mid y>0\}$$.

### Exercise 3a)¶

Exercise

Find the feasible set and find the optimal value for the problem above.

Since $$y>0$$, we can only have $$x^2/y\leq 0$$ if $$x=0$$. Hence the feasible set is given by $$\{(0,y) \mid y>0\}$$. The objective is constant equal to $$1$$ on the feasible set, which is hence the optimal value.

### Exercise 3b)¶

Exercise

Show that the problem does not satisfy the (weak) Slater condition.

In the entire feasible set we have $$x^2/y=0$$, and not $$x^2/y < 0$$, hence the problem is not strictly feasible and does not satisfy the Slater condition.

### Exercise 3c)¶

Exercise

Show that the optimal value of the dual of the problem is $$0$$.

The Lagrangian is given by

$L(x,y,\lambda) = e^{-x} + \lambda x^2/y$

The dual function is

$g(\lambda) = \inf_{x\in\mathbb R,y>0}\left(e^{-x}+\lambda x^2/y\right)$

if $$\lambda\geq 0$$ we have $$g(\lambda)=0$$. Clearly $$L(x,y,\lambda)>0$$ for any value of $$x,y$$, and its easy to see that $$\lim_{t\to\infty} L(t,t^3,\lambda)=0$$. If $$\lambda <0$$, then we have $$\lim_{t\to\infty}(t,1,\lambda)=-\infty$$. Therefore

$\begin{split}g(\lambda) = \left\{\begin{array}{ll}0 & \lambda\geq 0 \\ -\infty & \lambda < 0\end{array}\right.\end{split}$

The dual problem is therefore

$\begin{split}\begin{array}{ll} \text{maximize} & 0 \\ \text{subject to} & \lambda\geq 0 \end{array}\end{split}$

The optimal value of the dual problem is therefore 0.

## Exercise 4¶

### Exercise 4a)¶

Exercise

Let $$A\in \mathbb R^{n\times m}$$, $$x\in\mathbb R^n$$, $$b\in\mathbb R^m$$. Find the dual of the following problem

$\begin{split}\begin{array}{ll} \text{minimize} & \frac12 \|x\|^2\\ \text{subject to} & Ax=b \end{array}\end{split}$

The Lagrangian of the problem is given by

$L(x,\nu) = \frac 12 \|x\|^2 + \nu^\top(Ax-b)$

To find the infimum, we compute gradient with respect to $$x$$:

$\nabla_X L(x,\nu) = x+A^\top \nu$

The gradient is zero if $$x=-A^\top \nu$$. The dual function is therefore

$g(\nu) = -\nu^\top b-\frac12\nu^\top AA^\top \nu$

The dual problem is hence the unconstrained problem

$\text{maximize}\quad -\nu^\top b-\frac12\nu^\top AA^\top \nu$

### Exercise 4b)¶

Exercise

The problem of a) is equivalent to

$\begin{split}\begin{array}{ll} \text{minimize} & \|x\|\\ \text{subject to} & Ax=b \end{array}\end{split}$

Find the dual of this problem.

Hint: first try to minimize the Lagrangian for a fixed value of $$\|x\|$$. Use this to write the dual function as an infimum over a scalar parameter $$t\geq 0$$.

The Lagrangian is

$L(x,\nu)=\|x\| + \nu^\top (Ax-b)$

For a given value of $$\|x\|$$, this function is smallest for $$x\propto -A^\top \nu$$, so let $$x = -tA^\top \nu$$. The dual function is hence

$g(\nu) = \inf_{t\geq 0} \,t\left(\|A^\top \nu\| - \|A^\top \nu\|^2\right)-\nu^\top b$

if $$\|A^\top\nu\|\leq 1$$, then $$\|A^\top \nu\|-\|A^\top\nu\|^2 \geq 0$$, and hence this is minimized for $$t=0$$. Otherwise this function is unbounded below. Therefore

$\begin{split}g(\nu) = \left\{\begin{array}{ll}-\nu^\top b & \|A^\top \nu\|\leq 1 \\ -\infty & \|A^\top \nu\|>1\end{array}\right.\end{split}$

The dual problem is then

$\begin{split}\begin{array}{ll} \text{maximize} & -b^\top\nu\\ \text{subject to} & \|A^\top\nu\|\leq 1 \end{array}\end{split}$