Week 8¶
Exercise 1¶
Exercise
Derive the Lagrange dual of the following problem:
with \(x\in \mathbb R^n\), \(A\in \mathbb R^{m \times n}\), \(b\in \mathbb R^m\), and \(\phi\) is defined by a parameter \(c>0\) as follows
Hint: The Lagrangian should decouple into \(n\) seperate terms of form (\(u,y\in\mathbb R\))
To compute \(\inf_uf(u,y)\) consider the cases where the infimum is achieved for \(u<0\), \(u=0\) and \(u>0\) seperately.
We plot the function \(\phi\) for \(c=1\) below. Near \(x=0\) it looks like \(|x|\), and near \(|x|=c\) it has a vertical asymptote. This function can therefore be used as a penalty which promotes sparsity or robustness (like the usual \(|x|\) penalty), and at the same time it forces the constraint \(|x|<c\).
[1]:
import numpy as np
import matplotlib.pyplot as plt
`
def phi(u, c):
return np.abs(u) / (c - np.abs(u))
c = 1
epsilon = 1e-8
x = np.linspace(-c + epsilon, c - epsilon, 100)
y = phi(x, c)
plt.figure(figsize=(8, 4))
plt.ylim(0, 5)
plt.title("Plot of $\phi(x)$ for $c=1$")
plt.axvline(-c, c="k", ls="--")
plt.axvline(c, c="k", ls="--")
plt.plot(x, y);

We can write the equality constraint as
Introducing Lagrange multipliers \(\nu = (\nu_1,\dots,\nu_m)\), the Lagrangian is then given by
where \(a_i = (A_{i1},\dots,A_{im})^\top\) is the \(i\)th column of \(A\). The dual function is then given by
To simplifty this expression we study the the function
If \(u> 0\) the derivative of \(\phi(u)+yu\) is given by
which is zero if \((c-u)^2=-c/y\), so we need to assume \(y<0\). Then \(u=c-\sqrt{c/|y|}\), and since \(u> 0\) we also need \(\sqrt{c/|y|}> c\) or, \(y< -1/c\). Then \(h(y)\) has value
If \(u<0\) we similarly derive that \(u=-(c-\sqrt{c/|y|})\) and \(y> 1/c\). In this case \(h(y)\) has the same value as above.
Finally, there is the possibility that \(|y|\leq 1/c\). In this case for \(u<0\) the derivatives are always negative, and for \(u>0\) the derivatives are always positive. By continuity, the infumum must be achieved at \(u=0\) where \(h(y)=0\). In summary we have
Then finally the dual problem is given by
Exercise 2¶
Exercise
We want to estimate the covariance matrix \(\Sigma\in S_{++}^n\) of a multivariate normal distribution with mean \(\mu\in \mathbb R^n\) from data. Given data \(\mathbf x=(x_1,\dots,x_N)\in \mathbb R^{n\times N}\), this means we want to maximize the likelihood of \(\mathbf x\) as a function of \(\Sigma\). That is, we want to solve the following maximization problem:
Exercise 2a)¶
Exercise
Show that the problem above is equivalent to
Where \(A\in\mathbb S^n\) is a symmetric matrix. Recall that we proved in week 4 that \(-\log\det\) is a convex function, so this is a convex problem in the variable \(\Sigma^{-1}\).
We take a logarithm and ignore the constant term to obtain the equivalent problem
If we define \(A = \sum_i (x_i-\mu)^\top (x_i-\mu) \in \mathbb S^n\), and use \(\log\det \Sigma=-\log\det\Sigma^{-1}\) then we obtain
Multipling the objective by \(-1\) and minimizing instead of maximizing gives the required answer.
Exercise 2b)¶
Exercise
Recall from exercise 4 of week 4 that for \(X\succeq 0\) and \(Y\) symmetric, we have
Using the Taylor series of \(\log(1+x)\) and the fact that \(\det(I+tA) = 1+\operatorname{tr} (tA) + o(t^2)\), show that the directional derivative of \(\log \det\) satisfies
We have
here we used the fact that the trace is linear, and that it is cyclic: \(\operatorname{tr}(ABC)=\operatorname{tr}(CAB)\).
Exercise 2c)¶
Exercise
Show that \((X+tY)^{-1} = X^{-1}(I+tYX^{-1})^{-1}\). Then use the power series \((1-x)^{-1} = \sum_{n=0}^\infty x^n\) to show that
We have that
Hence \((I+tYX^{-1})^{-1} = X(X+tY)^{-1}\), multiplying both sides by \(X^{-1}\) gives the result. We therefore have
plugging this into the directional derivative, we obtain
again we used the fact that trace is linear and cyclic.
Exercise 2d)¶
Exercise
Assume \(A\succ0\). Solve the minimization problem
First solve the unconstrained problem by computing the directional derivatives and setting them to zero. Show that the resulting solution \(\Sigma^*\) satisfies \(\Sigma^*\succ 0\).
Denote the objective by \(f(\Sigma)\). Using a) and b):
If \(\Sigma\) is optimal, then this directional derivative is zero for all \(Y\). This happens if and only if
Factoring out the \(\Sigma^{-1}\) on the left, we obtain
Since \(\Sigma \succ 0\), the matrix \(\Sigma^{-1}\) is full rank. Hence, this means that \(NI- A\Sigma^{-1} = 0\). We have as solution \(\Sigma^* = \frac1N A\succ 0\).
Exercise 2e)¶
Exercise
Assume \(A\succ 0\). Write down the Lagrangian of
Show that the critical points of the Lagrangian can be obtained by solving a quadratic matrix equation:
Hint: \(\Sigma\) is a critical point of \(L(\Sigma,\Lambda)\) if
This gives a matrix equation for the minimum. You can transform it to the required form by multiplying on the left and on the right by the symmetric square root of \(A\), that is, \(A^{\frac12}\).
We can rewrite \(\Sigma\succeq 0\) as a term in the Lagrangian \(\operatorname{tr}(\Lambda \Sigma)\), with \(\Lambda\in\mathbb S^n\) a Lagrange multiplier. That is, the Lagrangian is
to compute the dual function \(g(\Lambda)\) we need to minimize the Lagrangian. We have
This is zero for all \(Y\) if and only if
Multiplying left and right by \(A^{\frac12}\) we obtain
Multiplying by \(-2\)
This is a quadratic matrix equation \(X^2+aX+Y=0\) with
Exercise 2f)¶
Exercise
Let \(X,Y\) be symmetric matrices, and consider the matrix equation \(X^2+aX+Y\).
Like in the proof of the quadratic formula, we can complete the square. Find coefficients \(\alpha,\beta\in\mathbb R\) such that
Find the condition on \(Y\) such that the quadratic matrix equation has a solution and give a formula for the solution.
We have
Assuming \(\frac{a^2}{4}I-Y\succeq 0\) we can take the positive matrix square root on both sides to obtain
Then we obtain
We could now in principle compute the dual problem for the problem of 2d), but unfortunately the problem will not be very pretty.
Exercise 3¶
Exercise
\[\begin{split}\begin{array}{ll} \text{Minimize} & e^{-x}\\ \text{Subject to} & x^2 / y \leq 0 \end{array}\end{split}\]
We consider this problem on the half-space domain \(\mathcal D = \{ (x,y)\in \mathbb R^2\mid y>0\}\).
Exercise 3a)¶
Exercise
Find the feasible set and find the optimal value for the problem above.
Since \(y>0\), we can only have \(x^2/y\leq 0\) if \(x=0\). Hence the feasible set is given by \(\{(0,y) \mid y>0\}\). The objective is constant equal to \(1\) on the feasible set, which is hence the optimal value.
Exercise 3b)¶
Exercise
Show that the problem does not satisfy the (weak) Slater condition.
In the entire feasible set we have \(x^2/y=0\), and not \(x^2/y < 0\), hence the problem is not strictly feasible and does not satisfy the Slater condition.
Exercise 3c)¶
Exercise
Show that the optimal value of the dual of the problem is \(0\).
The Lagrangian is given by
The dual function is
if \(\lambda\geq 0\) we have \(g(\lambda)=0\). Clearly \(L(x,y,\lambda)>0\) for any value of \(x,y\), and its easy to see that \(\lim_{t\to\infty} L(t,t^3,\lambda)=0\). If \(\lambda <0\), then we have \(\lim_{t\to\infty}(t,1,\lambda)=-\infty\). Therefore
The dual problem is therefore
The optimal value of the dual problem is therefore 0.
Exercise 4¶
Exercise 4a)¶
Exercise
Let \(A\in \mathbb R^{n\times m}\), \(x\in\mathbb R^n\), \(b\in\mathbb R^m\). Find the dual of the following problem
The Lagrangian of the problem is given by
To find the infimum, we compute gradient with respect to \(x\):
The gradient is zero if \(x=-A^\top \nu\). The dual function is therefore
The dual problem is hence the unconstrained problem
Exercise 4b)¶
Exercise
The problem of a) is equivalent to
Find the dual of this problem.
Hint: first try to minimize the Lagrangian for a fixed value of \(\|x\|\). Use this to write the dual function as an infimum over a scalar parameter \(t\geq 0\).
The Lagrangian is
For a given value of \(\|x\|\), this function is smallest for \(x\propto -A^\top \nu\), so let \(x = -tA^\top \nu\). The dual function is hence
if \(\|A^\top\nu\|\leq 1\), then \(\|A^\top \nu\|-\|A^\top\nu\|^2 \geq 0\), and hence this is minimized for \(t=0\). Otherwise this function is unbounded below. Therefore
The dual problem is then