Open in Colab

Week 8

Exercise 1

Exercise

Derive the Lagrange dual of the following problem:

\[\begin{split}\begin{array}{l l} \text{minimize} & \sum_i^n\phi(x_i)\\ \text{subject to} & Ax=b \end{array}\end{split}\]

with \(x\in \mathbb R^n\), \(A\in \mathbb R^{m \times n}\), \(b\in \mathbb R^m\), and \(\phi\) is defined by a parameter \(c>0\) as follows

\[\phi\colon(-c,c)\to \mathbb R_+,\ x\mapsto \frac{|x|}{c-|x|}.\]

Hint: The Lagrangian should decouple into \(n\) seperate terms of form (\(u,y\in\mathbb R\))

\[f(u,y) = \frac{|u|}{c-|u|}+yu\]

To compute \(\inf_uf(u,y)\) consider the cases where the infimum is achieved for \(u<0\), \(u=0\) and \(u>0\) seperately.

We plot the function \(\phi\) for \(c=1\) below. Near \(x=0\) it looks like \(|x|\), and near \(|x|=c\) it has a vertical asymptote. This function can therefore be used as a penalty which promotes sparsity or robustness (like the usual \(|x|\) penalty), and at the same time it forces the constraint \(|x|<c\).

[1]:
import numpy as np
import matplotlib.pyplot as plt
`

def phi(u, c):
    return np.abs(u) / (c - np.abs(u))


c = 1
epsilon = 1e-8
x = np.linspace(-c + epsilon, c - epsilon, 100)
y = phi(x, c)

plt.figure(figsize=(8, 4))
plt.ylim(0, 5)
plt.title("Plot of $\phi(x)$ for $c=1$")
plt.axvline(-c, c="k", ls="--")
plt.axvline(c, c="k", ls="--")
plt.plot(x, y);
../_images/nblinks_week8_3_0.png

We can write the equality constraint as

\[(Ax-b)_j = \sum_{i=1}^n A_{ji}x_i -b_j = 0\]

Introducing Lagrange multipliers \(\nu = (\nu_1,\dots,\nu_m)\), the Lagrangian is then given by

\[\begin{split}\begin{align} L(x,\nu) &= \sum_{i=1}^n\phi(x_i) + \sum_{j=1}^m\nu_j\sum_{i=1}^n A_{ji}x_i -b_j\\ &= \sum_{i=1}^n\left(\phi(x_i) + x_i\left(a_i^\top \nu\right)\right) -b^\top \nu \end{align}\end{split}\]

where \(a_i = (A_{i1},\dots,A_{im})^\top\) is the \(i\)th column of \(A\). The dual function is then given by

\[\begin{align} g(\nu)&=\inf_x L(x,\nu) = -b^\top \nu + \sum_{i=1}^n \inf_{x_i} \phi(x_i)+x_i(a_i^\top \nu) \end{align}\]

To simplifty this expression we study the the function

\[h(y) =\inf_u \frac{|u|}{c-|u|}+yu.\]

If \(u> 0\) the derivative of \(\phi(u)+yu\) is given by

\[\frac{\mathrm d}{\mathrm du}\frac{u}{c-u}+yu = \frac{c}{(c-u)^2} +y\]

which is zero if \((c-u)^2=-c/y\), so we need to assume \(y<0\). Then \(u=c-\sqrt{c/|y|}\), and since \(u> 0\) we also need \(\sqrt{c/|y|}> c\) or, \(y< -1/c\). Then \(h(y)\) has value

\[\begin{split}\begin{align} &\frac{c-\sqrt{c/|y|}}{c-\left(c-\sqrt{c/|y|}\right)}-|y|c+|y|\sqrt{c/|y|}\\ &=\frac{c}{\sqrt{c/|y|}} -1 - |y|c+|y|\sqrt{c/|y|}\\ &=-1 - |y|c+2\sqrt{c|y|} = -\left(1-\sqrt{c|y|}\right)^2 \end{align}\end{split}\]

If \(u<0\) we similarly derive that \(u=-(c-\sqrt{c/|y|})\) and \(y> 1/c\). In this case \(h(y)\) has the same value as above.

Finally, there is the possibility that \(|y|\leq 1/c\). In this case for \(u<0\) the derivatives are always negative, and for \(u>0\) the derivatives are always positive. By continuity, the infumum must be achieved at \(u=0\) where \(h(y)=0\). In summary we have

\[\begin{split}h(y) = \inf_u \frac{|u|}{c-|u|}+yu = \left\{\begin{array}{ll} -(1-\sqrt{c|y|})^2,&|y|>1/c\\ 0 & |y|\leq 1/c \end{array} \right.\end{split}\]

Then finally the dual problem is given by

\[\begin{array}{ll} \min_{\nu} & -b^\top \nu + \sum_{i=1}^n h(a_i^\top \nu) \end{array}\]

Exercise 2

Exercise

We want to estimate the covariance matrix \(\Sigma\in S_{++}^n\) of a multivariate normal distribution with mean \(\mu\in \mathbb R^n\) from data. Given data \(\mathbf x=(x_1,\dots,x_N)\in \mathbb R^{n\times N}\), this means we want to maximize the likelihood of \(\mathbf x\) as a function of \(\Sigma\). That is, we want to solve the following maximization problem:

\[\begin{split}\begin{array}{l l} \text{maximize} & \prod_{i=1}^N (2\pi)^{-n/2}(\det \Sigma)^{-1/2} \exp \left( -\frac12 (x_i-\mu)^\top \Sigma^{-1} (x_i-\mu)\right)\\ \text{subject to} & \Sigma\succ 0 \end{array}\end{split}\]

Exercise 2a)

Exercise

Show that the problem above is equivalent to

\[\begin{split}\begin{array}{l l} \text{minimize} & -\frac{N}{2} \log\det \Sigma^{-1} +\frac12 \langle A,\,\Sigma^{-1}\rangle \\ \text{subject to} & \Sigma\succ 0 \end{array}\end{split}\]

Where \(A\in\mathbb S^n\) is a symmetric matrix. Recall that we proved in week 4 that \(-\log\det\) is a convex function, so this is a convex problem in the variable \(\Sigma^{-1}\).

We take a logarithm and ignore the constant term to obtain the equivalent problem

\[\begin{split}\begin{array}{l l} \text{maximize} & -\frac{N}{2} \log\det \Sigma -\frac12 \sum_{i=1}^n (x_i-\mu)^\top \Sigma^{-1}(x_i-\mu) \\ \text{subject to} & \Sigma\succ 0 \end{array}\end{split}\]

If we define \(A = \sum_i (x_i-\mu)^\top (x_i-\mu) \in \mathbb S^n\), and use \(\log\det \Sigma=-\log\det\Sigma^{-1}\) then we obtain

\[\begin{split}\begin{array}{l l} \text{maximize} & -\frac{N}{2} \log\det \Sigma^{-1} -\frac12 \langle A,\,\Sigma^{-1}\rangle \\ \text{subject to} & \Sigma\succ 0 \end{array}\end{split}\]

Multipling the objective by \(-1\) and minimizing instead of maximizing gives the required answer.

Exercise 2b)

Exercise

Recall from exercise 4 of week 4 that for \(X\succeq 0\) and \(Y\) symmetric, we have

\[\log \det (X+tY) = \log\det(X) + \log\det(I+tX^{-1/2}YX^{-1/2}).\]

Using the Taylor series of \(\log(1+x)\) and the fact that \(\det(I+tA) = 1+\operatorname{tr} (tA) + o(t^2)\), show that the directional derivative of \(\log \det\) satisfies

\[\left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0}\log \det (X+tY) = \operatorname{tr}(X^{-1}Y)\]

We have

\[\begin{split}\begin{align} \left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0}\log \det (X+tY) &= \left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0}\log\det(I+tX^{-1/2}YX^{-1/2})\\ &= \left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0} 1+\operatorname{tr} (tX^{-1/2}YX^{-1/2})+o(t^2)\\ &= \operatorname{tr} (X^{-1/2}YX^{-1/2})\\ &= \operatorname{tr} (X^{-1}Y) \end{align}\end{split}\]

here we used the fact that the trace is linear, and that it is cyclic: \(\operatorname{tr}(ABC)=\operatorname{tr}(CAB)\).

Exercise 2c)

Exercise

Show that \((X+tY)^{-1} = X^{-1}(I+tYX^{-1})^{-1}\). Then use the power series \((1-x)^{-1} = \sum_{n=0}^\infty x^n\) to show that

\[\left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0}\operatorname{tr} (A(X+tY)^{-1}) = -\operatorname{tr}(X^{-1}AX^{-1}Y).\]

We have that

\[I = (X+tY)(X+tY)^{-1} = (I+tYX^{-1})X(X+tY)^{-1}\]

Hence \((I+tYX^{-1})^{-1} = X(X+tY)^{-1}\), multiplying both sides by \(X^{-1}\) gives the result. We therefore have

\[(X+tY)^{-1} = X^{-1}(I+tYX^{-1})^{-1} = X^{-1}-tX^{-1}YX^{-1} +o(t^2)\]

plugging this into the directional derivative, we obtain

\[\begin{split}\begin{align} \left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0}\operatorname{tr} (A(X+tY)^{-1})&=\left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0} \operatorname{tr}(AX^{-1}-tAX^{-1}YX^{-1} + o(t^2))\\ &= -\operatorname{tr}(AX^{-1}YX^{-1}) \\ &= -\operatorname{tr}(X^{-1}AX^{-1}Y) \end{align}\end{split}\]

again we used the fact that trace is linear and cyclic.

Exercise 2d)

Exercise

Assume \(A\succ0\). Solve the minimization problem

\[\begin{split}\begin{array}{l l} \text{minimize} & \frac{N}{2} \log\det \Sigma +\frac12 \langle A,\,\Sigma^{-1}\rangle \\ \text{subject to} & \Sigma\succ 0 \end{array}\end{split}\]

First solve the unconstrained problem by computing the directional derivatives and setting them to zero. Show that the resulting solution \(\Sigma^*\) satisfies \(\Sigma^*\succ 0\).

Denote the objective by \(f(\Sigma)\). Using a) and b):

\[\left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0} f(X+tY) = \frac{-N}{2}\operatorname{tr}(\Sigma^{-1}Y)+\frac12 \operatorname{tr}(\Sigma^{-1}A\Sigma^{-1}Y)\]

If \(\Sigma\) is optimal, then this directional derivative is zero for all \(Y\). This happens if and only if

\[-\frac{N}{2}\Sigma^{-1}+\frac12 \Sigma^{-1}A\Sigma^{-1}=0\]

Factoring out the \(\Sigma^{-1}\) on the left, we obtain

\[\frac12\Sigma^{-1}\left(NI- A\Sigma^{-1}\right)=0\]

Since \(\Sigma \succ 0\), the matrix \(\Sigma^{-1}\) is full rank. Hence, this means that \(NI- A\Sigma^{-1} = 0\). We have as solution \(\Sigma^* = \frac1N A\succ 0\).

Exercise 2e)

Exercise

Assume \(A\succ 0\). Write down the Lagrangian of

\[\begin{split}\begin{array}{l l} \text{minimize} & \frac{N}{2} \log\det \Sigma +\frac12 \langle A,\,\Sigma^{-1}\rangle \\ \text{subject to} & \Sigma\succ 0 \end{array}\end{split}\]

Show that the critical points of the Lagrangian can be obtained by solving a quadratic matrix equation:

\[X^2+aX+Y = 0\]

Hint: \(\Sigma\) is a critical point of \(L(\Sigma,\Lambda)\) if

\[\left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0} L(\Sigma+tY,\Lambda) = 0,\qquad \forall Y\in\mathbb S^n\]

This gives a matrix equation for the minimum. You can transform it to the required form by multiplying on the left and on the right by the symmetric square root of \(A\), that is, \(A^{\frac12}\).

We can rewrite \(\Sigma\succeq 0\) as a term in the Lagrangian \(\operatorname{tr}(\Lambda \Sigma)\), with \(\Lambda\in\mathbb S^n\) a Lagrange multiplier. That is, the Lagrangian is

\[L(\Sigma,\Lambda) = -\frac{N}{2} \log\det \Sigma +\frac12 \langle A,\,\Sigma^{-1}\rangle+\operatorname{tr}(\Lambda \Sigma)\]

to compute the dual function \(g(\Lambda)\) we need to minimize the Lagrangian. We have

\[\left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0} L(\Sigma+tY,\Lambda) = \frac{-N}{2}\operatorname{tr}(\Sigma^{-1}Y)+\frac12 \operatorname{tr}(\Sigma^{-1}A\Sigma^{-1}Y)+\operatorname{tr}(\Lambda Y)\]

This is zero for all \(Y\) if and only if

\[\frac{N}{2}\Sigma^{-1}-\frac12 \Sigma^{-1}A\Sigma^{-1}=\Lambda\]

Multiplying left and right by \(A^{\frac12}\) we obtain

\[\frac{N}{2}\left(A^{\frac12}\Sigma^{-1}A^{\frac12}\right)-\frac12\left(A^{\frac12} \Sigma^{-1}A^{\frac12}\right)^2-A^{\frac12}\Lambda A^{\frac12}=0\]

Multiplying by \(-2\)

\[\left(A^{\frac12} \Sigma^{-1}A^{\frac12}\right)^2-N\left(A^{\frac12}\Sigma^{-1}A^{\frac12}\right)+2 A^{\frac12}\Lambda A^{\frac12}=0\]

This is a quadratic matrix equation \(X^2+aX+Y=0\) with

\[a=-N,\quad X = A^{\frac12}\Sigma^{-1}A^{\frac12},\quad Y = 2 A^{\frac12}\Lambda A^{\frac12}\]

Exercise 2f)

Exercise

Let \(X,Y\) be symmetric matrices, and consider the matrix equation \(X^2+aX+Y\).

Like in the proof of the quadratic formula, we can complete the square. Find coefficients \(\alpha,\beta\in\mathbb R\) such that

\[(X+\alpha I)^2 = \beta I - Y\]

Find the condition on \(Y\) such that the quadratic matrix equation has a solution and give a formula for the solution.

We have

\[\left(X+\frac a2 I\right)^2 = \frac {a^2}4I-Y\]

Assuming \(\frac{a^2}{4}I-Y\succeq 0\) we can take the positive matrix square root on both sides to obtain

\[X+\frac a2 I = \left(\frac {a^2}4I-Y\right)^{\frac12}\]

Then we obtain

\[X = -\frac a2 I+\left(\frac {a^2}4I-Y\right)^{\frac12}\]

We could now in principle compute the dual problem for the problem of 2d), but unfortunately the problem will not be very pretty.

Exercise 3

Exercise

Consider the following convex problem

\[\begin{split}\begin{array}{ll} \text{Minimize} & e^{-x}\\ \text{Subject to} & x^2 / y \leq 0 \end{array}\end{split}\]

We consider this problem on the half-space domain \(\mathcal D = \{ (x,y)\in \mathbb R^2\mid y>0\}\).

Exercise 3a)

Exercise

Find the feasible set and find the optimal value for the problem above.

Since \(y>0\), we can only have \(x^2/y\leq 0\) if \(x=0\). Hence the feasible set is given by \(\{(0,y) \mid y>0\}\). The objective is constant equal to \(1\) on the feasible set, which is hence the optimal value.

Exercise 3b)

Exercise

Show that the problem does not satisfy the (weak) Slater condition.

In the entire feasible set we have \(x^2/y=0\), and not \(x^2/y < 0\), hence the problem is not strictly feasible and does not satisfy the Slater condition.

Exercise 3c)

Exercise

Show that the optimal value of the dual of the problem is \(0\).

The Lagrangian is given by

\[L(x,y,\lambda) = e^{-x} + \lambda x^2/y\]

The dual function is

\[g(\lambda) = \inf_{x\in\mathbb R,y>0}\left(e^{-x}+\lambda x^2/y\right)\]

if \(\lambda\geq 0\) we have \(g(\lambda)=0\). Clearly \(L(x,y,\lambda)>0\) for any value of \(x,y\), and its easy to see that \(\lim_{t\to\infty} L(t,t^3,\lambda)=0\). If \(\lambda <0\), then we have \(\lim_{t\to\infty}(t,1,\lambda)=-\infty\). Therefore

\[\begin{split}g(\lambda) = \left\{\begin{array}{ll}0 & \lambda\geq 0 \\ -\infty & \lambda < 0\end{array}\right.\end{split}\]

The dual problem is therefore

\[\begin{split}\begin{array}{ll} \text{maximize} & 0 \\ \text{subject to} & \lambda\geq 0 \end{array}\end{split}\]

The optimal value of the dual problem is therefore 0.

Exercise 4

Exercise 4a)

Exercise

Let \(A\in \mathbb R^{n\times m}\), \(x\in\mathbb R^n\), \(b\in\mathbb R^m\). Find the dual of the following problem

\[\begin{split}\begin{array}{ll} \text{minimize} & \frac12 \|x\|^2\\ \text{subject to} & Ax=b \end{array}\end{split}\]

The Lagrangian of the problem is given by

\[L(x,\nu) = \frac 12 \|x\|^2 + \nu^\top(Ax-b)\]

To find the infimum, we compute gradient with respect to \(x\):

\[\nabla_X L(x,\nu) = x+A^\top \nu\]

The gradient is zero if \(x=-A^\top \nu\). The dual function is therefore

\[g(\nu) = -\nu^\top b-\frac12\nu^\top AA^\top \nu\]

The dual problem is hence the unconstrained problem

\[\text{maximize}\quad -\nu^\top b-\frac12\nu^\top AA^\top \nu\]

Exercise 4b)

Exercise

The problem of a) is equivalent to

\[\begin{split}\begin{array}{ll} \text{minimize} & \|x\|\\ \text{subject to} & Ax=b \end{array}\end{split}\]

Find the dual of this problem.

Hint: first try to minimize the Lagrangian for a fixed value of \(\|x\|\). Use this to write the dual function as an infimum over a scalar parameter \(t\geq 0\).

The Lagrangian is

\[L(x,\nu)=\|x\| + \nu^\top (Ax-b)\]

For a given value of \(\|x\|\), this function is smallest for \(x\propto -A^\top \nu\), so let \(x = -tA^\top \nu\). The dual function is hence

\[g(\nu) = \inf_{t\geq 0} \,t\left(\|A^\top \nu\| - \|A^\top \nu\|^2\right)-\nu^\top b\]

if \(\|A^\top\nu\|\leq 1\), then \(\|A^\top \nu\|-\|A^\top\nu\|^2 \geq 0\), and hence this is minimized for \(t=0\). Otherwise this function is unbounded below. Therefore

\[\begin{split}g(\nu) = \left\{\begin{array}{ll}-\nu^\top b & \|A^\top \nu\|\leq 1 \\ -\infty & \|A^\top \nu\|>1\end{array}\right.\end{split}\]

The dual problem is then

\[\begin{split}\begin{array}{ll} \text{maximize} & -b^\top\nu\\ \text{subject to} & \|A^\top\nu\|\leq 1 \end{array}\end{split}\]