Exam project¶
This is the Python programming homework that counts for the exam of this course. This homework will require a little bit more independence from you, and you will need to write more code fully by yourself than for other homework. Furthermore there are no automatic tests, so you need to verify yourself that your solution is correct.
As usual, you can do this homework either on Google colab, or on your own machine (after installing conda).
We begin by running the cell below. If you’re running this notebook on a local python installation, make sure that this cell does not throw any errors. If it does, you will need to install additional packages. You can do this by using pip
from the command line.
[1]:
from importlib_metadata import version
import cvxpy as cp
import matplotlib
import matplotlib.pyplot as plt
matplotlib.rcParams["figure.figsize"] = (9, 6)
import numpy as np
import pandas as pd
import scipy
import warnings
warnings.filterwarnings("ignore") # don't display irrelevant deprecation warnings
Part 1: Alternating projection algorithm¶
Most solvers of convex problems \(\min_{x\in\Omega}f(x)\) always start by first finding a feasible point \(x\in \Omega\). For some constraint sets this step can be very difficult, but if \(\Omega\) is the intersection of several simple constraints then we can use a relatively simple method to find a feasible point.
Suppose that \(\Omega = C_1\cap\dots\cap C_k\subset \mathbb R^n\), with each \(C_i\subset \mathbb R^n\) closed convex, and such that we have a projection map \(P_{C_i}\colon \mathbb R^n\to C_i\). Recall that \(P_{C_i}(x)\) returns the point \(y \in C_i\) that is closest to \(x\) in Euclidean norm. Then starting with some \(x_0\) we could naively set \(x = P_{C_k}\dots P_{C_2}P_{C_1}x_0\) and expect that \(x\in \Omega\). This is unfortunately not the case, but we can iterate this procedure, that is, we can set
This iterative procedure is known as the Alternating projection algorithm, also known as projection onto convex sets (POCS).
One can prove that, as long as \(\Omega\neq\emptyset\), then the iterates \((x_i)\) produced by the alternating projection algorithm converge linearly to a point in \(x\in\Omega\). Note that this point \(x\) is not necessarily the projection of \(x_0\) onto \(\Omega\), nor does the algoritm necessarily give a point in \(\Omega\) in a finite number of steps (hence I think that “projection onto convex sets” is a bit of misnomer). Nevertheless, for the purpose of finding a feasible point for a convex solver this is good enough: As long as \(x\) satisfies the constraints up to some small error on the order of the machine epsilon, we can consider \(x\) to lie in \(\Omega\) from a numerical point of view.
Exercise 1a) Finding a point in a polyhedron¶
Recall that a convex polyhedron is given by a set of inequalities of the form \(C_i=\{x\in\mathbb R^n\mid a_i^\top x\leq b_i\}\) for a vector \(a_i\in\mathbb R^n\) and \(b_i\in \mathbb R\), or encoded as a matrix \(\Omega = \{x\in\mathbb R^n\mid Ax\leq b\}\) for some matrix \(A\) and vector \(b\). Each convex set \(C_i\) is a halfspace, and we can project onto it by setting
Implement the alternating projection algorithm for finding a point in a polyhedron. As input you should accept any \(m\times n\) matrix
A
and length \(m\) vectorb
.Do at most
max_its
iterations. Sometimes the algorithm will return a point inside the polyhedron in a finite number of steps, or sometimes it will converge to a point very close to the polyhedron (a distance on the order of \(10^{-16}\)). Try to detect this convergence, and if so make the algorithm stop early (i.e. in less thanmax_its
iterations).
To generate test problems, we need to generate random polytopes in arbitrary dimensions. The simplest way to do this is to compute the convex hull of a random set of points. For this we can use the scipy
routine scipy.spatial.ConvexHull, which is a Python interface to the C++ library Qhull.
Computing the convex hull in high dimension can be very expensive. Only run the code
random_polytope
for values ofn_dim
until 10.
[2]:
def random_polytope(n_dims, n_points=None):
"""Generate a random polytope by taking convex hull of random points"""
from scipy.spatial import ConvexHull
if n_points is None: # Number of points in the convex hull
n_points = n_dims * 2 + 10
points = np.random.normal(size=(n_points, n_dims))
shift = np.random.uniform(
-100, 100, size=n_dims
) # Shift the points so 0 isn't a feasible point
points += shift
# Find the convex hull, and extract the hyperplane constraints
hull = ConvexHull(points)
A = hull.equations[:, :-1]
b = -hull.equations[:, -1]
return A, b, hull
def plot_hull_2D(hull):
"""Plot a 2d convex hull"""
points = hull.points
for n1, n2 in hull.simplices:
v1 = points[n1]
v2 = points[n2]
plt.plot([v1[0], v2[0]], [v1[1], v2[1]], c="k")
plt.plot(points[hull.vertices, 0], points[hull.vertices, 1], "or")
[3]:
def alternating_projection_polyhedron(A, b, max_its=100):
"""
Compute a feasible point in the polyhedron A@x<=b using alternating
projection algorithm.
Use a maximum of `max_its` iterations. Stop early when converge is reached.
"""
# BEGIN SOLUTION
x = np.zeros(A.shape[1])
def proj_once(i, x):
"project `x` onto `i`th hyperplane"
a = A[i]
bi = b[i]
violation = np.dot(a, x) - bi
if violation <= 0:
return x, 0
else:
return x - violation * a / np.dot(a, a), violation
for _ in range(max_its):
max_violation = 0
for i in range(len(A)):
x, violation = proj_once(i, x)
max_violation = max(max_violation, violation)
if max_violation <= 1e-12:
break
return x
# END SOLUTION
# Generate random polytope, and plot if 2D
# You should `dim` to other values, and see if it still works
# Don't put `dim` higher than ~10, since then the convex hull
# takes long to compute.
dim = 2
A, b, hull = random_polytope(dim)
if dim == 2:
plot_hull_2D(hull)
x = alternating_projection_polyhedron(A, b)
plt.plot([x[0]], [x[1]], "ob", markersize=10);

Exercise 1b): Feasible point for SDP constraints¶
Recall that SDP constraints on a real symmetric matrix \(X\in\mathbb S_n\) are given by
with \(A_i\in \mathbb S_n\) and \(b_i\in\mathbb R\). Projecting onto the hyperplane \(C_i=\{\langle A_i,X\rangle = b_i\mid X\in \mathbb S_n\}\) is easy: similar to the linear case, we have
Projecting onto the positive definite cone \(\mathbb S^n_+=\{X\succeq 0\}\) can be achieved through an eigenvalue decomposition. If \(X = U\Lambda U^\top\), with \(\Lambda = \mathrm{diag}(\lambda_1,\dots,\lambda_n)\), then we can update \(X\) by setting all the negative eigenvalues to zero:
Implement the alternating projection algorithm for SDP constraints. As input you need to accept an \(k\times n\times n\) array
A
and a size \(k\) vectorb
. Here,A[i,:,:]
represents \(A_i\). Do a maximum ofmax_its
iterations.You also need to study the convergence speed. To study the error you can, for example, look at the residual vector \(r \in \mathbb{R}^k\) defines as \(r_i = \langle A_i,X\rangle-b_i\), and then take a suitable norm. Make a plot showing how this error changes with each iteration. The vertical axis should be logarithmic.
Based on this analysis you can also implement a stopping criterion. That is, a criterion to stop the algorithm before
max_its
iterations.
Tips:
To compute the eigendecomposition of a symmetric matrix, you can use np.linalg.eigh. This routine is specific to symmetric matrices, because this means we can use a faster algorithm.
If you first normalize \(A\) so that \(\langle A_i,A_i\rangle=1\), then you don’t need to divide by \(\langle A_i,A_i\rangle\) at every step.
You can make
matplotlib
use a logarithmic \(y\)-axis by callingplt.yscale('log')
For the experts: To compute an inner product of two matrices \(\langle A,B\rangle=\operatorname{tr}(A^\top B)\) efficiently, you can use np.tensordot. With the default argument
np.tensordot(A, B, axes=2)
this contracts the last two axes ofA
andB
. In effect this also means that ifA
is of shape(k,n,n)
andX
of shape(n,n)
thennp.tensordot(A,X)
is of shapek
, and contains the \(k\) values \(\langle A_i,X\rangle\).
[4]:
def alternating_projection_SDP(A, b, max_its=200):
"""
Compute a feasible point for SDP constraints using alternating projection algorithm.
Return an (almost) feasible point, together with residuals for the constraints.
"""
# BEGIN SOLUTION
k, n, _ = A.shape
A_norm = 1 / np.linalg.norm(A, axis=(1, 2)) ** 2 # compute the norm only once
X = np.eye(n)
residuals = []
for _ in range(max_its):
# tensordot is more efficient than for loop containing matrix product + trace
residual = np.tensordot(A, X) - b
# einsum is not more efficient, but easier
X = X - np.einsum("i,ijk,i->jk", residual, A, A_norm)
# compute eigendecomposition. Set negative eigenvalues to zero
v, U = np.linalg.eigh(X)
v = np.clip(v, 0, None)
X = U @ np.diag(v) @ U.T
residual_norm = np.linalg.norm(residual)
residuals.append(residual_norm)
if residual_norm < 1e-10:
break
return X, residuals
# END SOLUTION
n = 10
k = 5
# Create array of random symmetric matrices
A = np.random.normal(size=(k, n, n))
A = A + np.transpose(A, (0, 2, 1))
b = np.random.normal(size=k)
X, residuals = alternating_projection_SDP(A, b)
# Plot how the residuals change with each iteration
# BEGIN SOLUTION
# here we plot norm of residual error.
plt.title("Norm of constraint deviations")
plt.plot(residuals, "o-")
plt.yscale("log")
# END SOLUTION

Part 2: Frank-Wolfe method and LASSO¶
After finding a feasible point, there are many algorithms to solve a convex problem. The algorithm we will consider is the Frank-Wolfe method. The idea is that optimizing a linear function over a convex domain is relatively simple, so at each step of the algorithm we linearize our objective and do a step towards the optimum of the linearized function. That is, if we have a convex function \(f\colon\Omega \to \mathbb R\), an we start with some point \(\beta_0\in \Omega\). Then in an iterative procedure we define \(\beta_{k+1}\in\Omega\) given \(\beta_k\in\Omega\) at step \(k\) by first solving the linear problem
Then we update \(\beta_{k}\) by setting
with \(t_k\) a suitably chosen stepsize. Because of the optimality conditions, \(\alpha_k-\beta_k\) is then a descent direction, so the function is gauranteed to decrease along this line. Note that for \(t=1\) we obtain the point \(\alpha_k\), and therefore by convexity any value \(t_k\in[0,1]\) is a valid stepsize (i.e., one such that \(\beta_{k+1}\in\Omega\)). For many problems we can compute the optimal value for \(t_k\), but if not then we can use a constant stepsize or a line search method such as Armijo backtracking.
The largest advantage of the Frank-Wolfe method is that it is versatile and simple to implement. We can use it so long as we can compute the gradient of a function cheaply, and use a linear programming solver as a black box. Its main disadvantage is that it has sublinear convergence speed, and for most convex problems there are more efficient specialized solvers.
The Frank-Wolfe method is similar to the projected gradient descent method, but there instead we take an unconstrained linear step in the direction of the gradient, and then project back onto the constraint set. For many constraint sets there is no efficient way to project onto the set, in which case the Frank-Wolfe method can be cheaper.
We will apply the Frank-Wolfe method to the LASSO problem. This is least-squares regression, with an \(\ell_1\)-norm constraint on the parameters. That is, given \(N\) (mean centered) data points \(X=(x_1,\dots,x_N)^\top\in \mathbb R^{N\times n}\) and \(y=(y_1,\dots,y_N)^\top\in \mathbb R^N\) we consider the following problem for \(s\in\mathbb R\) fixed:
This problem is convex, and we will solve it using the Frank-Wolfe method.
Implement the Frank-Wolfe method to solve the Lasso problem.
Tips:
Finding a feasible point for this problem is very easy.
Find an explicit expression for the gradient \(\nabla f(\beta_k)\) at a point \(\beta_k\). Use this to formulate the linear problem \(\alpha_k = \operatorname{argmin}_{\|\alpha\|_1\leq s}\alpha^\top \nabla f(\beta_k)\), and solve it using CVXPY.
For this problem we can do an exact linesearch. That is, there is a simple formula for the optimal stepsize \(t_k\). Find this by first solving the problem \(t_k=\operatorname{argmin}_{t\in[0,1]}\|y-X(\beta_k+t(\alpha_k-\beta_k))\|\).
You can use the stepsize as a stopping criterion. If the stepsize \(t_k\) is very small, then we have reached the boundary of domain, and the search direction was close to orthogonal to this boundary.
[7]:
N = 100 # number of data points
n = 10 # dimension
# Generate random ground-truth data
X = np.random.normal(size=(N, n))
beta_true = np.random.normal(size=n)
# make true Beta inside constraint set. If you comment out this line
# then the estimated beta is probably no longer equal to `beta_true`.
# You should experiment with this. Also experiment with the value of `s`.
beta_true /= 2 * np.linalg.norm(beta_true, ord=1)
X = X - np.mean(X, axis=0)
y = X @ beta_true
def FW_LASSO(X, y, s=1, max_its=100):
"""Solve the LASSO problem using Frank-Wolfe.
Input is mean-centered data X,y, the value of `s`,
and the maximum number of iterations."""
# BEGIN SOLUTION
N, n = X.shape
beta = np.zeros(n)
beta_norms = []
stepsizes=[]
for _ in range(max_its):
grad = LASSO_grad(X, y, beta)
alpha = min_linear_problem(grad, s)
t = opt_linesearch(X, y, beta, alpha - beta)
beta_norms.append(t*np.linalg.norm(alpha-beta))
stepsizes.append(t)
beta = beta + t * (alpha - beta)
if t < 1e-8: # step size is good stopping criterion
break
return beta, beta_norms, stepsizes
# END SOLUTION
def LASSO_grad(X, y, beta):
"""Return the gradient of the LASSO problem for data X,y at point `beta`"""
# BEGIN SOLUTION
return 2 * (X.T @ X) @ beta - 2 * X.T @ y
# END SOLUTION
def min_linear_problem(grad, s):
"""Find the minimum of the linear problem.
Depends on the LASSO gradient and the value of `s` determining
the constraint set."""
# BEGIN SOLUTION
alpha = cp.Variable(len(grad))
objective = cp.Minimize(grad @ alpha)
constraints = [cp.norm(alpha, p=1) <= s]
problem = cp.Problem(objective, constraints)
problem.solve()
return alpha.value
# END SOLUTION
def opt_linesearch(X, y, beta, direction):
"""Return the result of exact linesearch.
That is, the best possible choice of stepsize `t`. Make sure
that this doesn't escape the constraint set."""
# BEGIN SOLUTION
numerator = np.dot(y, X @ direction) - np.dot(X @ beta, X @ direction)
denominator = np.linalg.norm(X @ direction) ** 2
if denominator == 0:
return 1
else:
return min(numerator / denominator, 1)
# END SOLUTION
# Compute beta from data. Compare to ground truth
beta, beta_norms, stepsizes = FW_LASSO(X, y, s=2)
beta - beta_true
plt.plot(beta_norms, label=r"$\|\alpha_k - \beta_k\|$")
plt.plot(stepsizes, label="stepsize $t_k$")
plt.yscale('log')
plt.legend();
plt.title("Norm of search direction and stepsize for Frank-Wolfe Lasso")
[7]:
Text(0.5, 1.0, 'Norm of search direction and stepsize for Frank-Wolfe Lasso')

[ ]: