Dimensionality Reduction

Singular Value Decomposition

All matrices can be "factorised" into the form $X=U \Sigma V^T$, where $U$ is a matrix comprised of the left eigenvectors, $V^T$ a matrix of right eigenvectors, and $\Sigma$ is the diagonal matrix of singular values. This decomposition can be used for generating low-rank approximations of matrices, preserving as much information as possible whilst reducing the storage required.

The following function takes a matrix M and a desired rank k, and returns the best approximation for M by a rank k matrix.

We test on a 512x512 image of Eileen Colins from skimage.data.

We now separate the image into the three colour channels (RGB), making sure that the values are scaled to be between 0 and 1 (instead of 0 to 255).

We plot the singular values for each separate channel (regular and log-log).

The Frobenius norm of a matrix $M$ is $$ \|M\|_F = \sqrt{\sum_{i,j} M_{ij}^2} = \sqrt{\sum_i \sigma_i^2},$$ where $\sigma_i$ are the singular values of $M$.

Our goal is to compress the image by finding a lower rank approximation. In order to assess the quality of the compressed image we can use the ratio of the Frobenius norm between the compressed image and the original one. Let $M_k$ be the approximation of $M$ by a rank-$k$ matrix.

Define $$ \rho_k(M) = \frac{\| M_k\|_F}{\|M\|_F}.$$

We compute the values of $\rho_k(M)$ for $M=$ image_R,image_G,image_B (k=1,...,512).

We want to compress the image, and control the loss of quality.
i.e find the smallest rank $k$ required (per channel), to achieve $\rho_k >= \text{QUALITY}$.

We compress each of the channels separately, using the low_rank function above, and using the above k_R,k_G,k_B as the input. We clip the results to ensure every value is between 0 and 1.

Here are our results.

We implement a function compression_rate that computes the compression of an image using the low-rank approximation. The inputs are:

The output of this function is the ratio $N_{\text{original}}/N_{\text{compressed}}$, where $N_{\text{original}}$ is the total number of values needed to store the original image (flat, no SVD), and and $N_{\text{compressed}}$ is the total number of values needed to be stored using our low-rank representation.

Below we present an assortment of low-rank versions of our original image, for each colour channel and then combined, with the compression rate shown above the full colour image.

Principal Component Analysis

We want to demonstrate PCA on a 13-dimensional dataset, by loading the wine dataset from sklearn.
This dataset contains chemical analysis of N=178 different wines by three different cultivators.
The analysis contains the folowing measurements:

  • Alcohol

  • Malic acid

  • Ash

  • Alcalinity of ash

  • Magnesium

  • Total phenols

  • Flavanoids

  • Nonflavanoid phenols

  • Proanthocyanins

  • Colour intensity

  • Hue

  • OD280/OD315 of diluted wines

  • Proline

  • </ul> </dd> </dl>

    Overall we have N=178 data points, lying in $\mathbb{R}^{D}$, with D=13. We stack all points together into a matrix X_wine $\in \mathbb{R}^{D\times N}$.
    We have labels 0,1, or 2 for each wine (clutivator). The true labels are given in L_wine.
    We want to see whether PCA can be helpful in the unsupervised task of clustering the 178 wines.

    We start by loading the dataset, and printing the first 5 data points, just to get a general impression.

We implement a function called pc_transform. The inputs of this functions are:

The function computes the principal components listed in PCS, and then return the projection of X on these principal components.
For example, get_transform(X, [0,4]) should return the projection of X on the first and fifth principal components.

We now compute the projection of X_wine along the first 2 principal components, placing the result in a matrix Y $\in \mathbb{R}^{2\times N}$.
We then plot the projected points, and colour them according to the original labels.

We will now try to improve the result by normalisation, taking each row of X_wine, and standardising it to have mean 0 and variance 1. We do this using the StandardScaler from sklearn.preprocessing.

This has indeed helped.

PCA for Eigenfaces

We now work with a dataset consisting of numerous faces and create a basis of so-called eigenfaces. The images are 192x168, and are stored as vectors of length 32,256.

This utility function will just let us convert an image from vector to matrix representation, so it can be showed on the screen.

We load the faces database.
FACES - a where each entry is a collection of images of a single person
NPEOPLE - number of people in the list (should be 20)
NFACES - number of images per person (should be 64)

First, we will examine the photos of a single person.
PI - the index of the person we examine.
X_person - the data matrix, containing all images of person PI, as columns.

We start by presenting all the photos.

Next, we want to find the "eigenfaces", i.e., the principal components of this collection of images.
When using PCA, we should do all the processing for the cetnered data, i.e., with mean 0. Therefore, we take the following steps:

  1. Define M_person to be the mean of all images in X_person (should be a 32,256-dimensional vector).
  2. Subtract M_person from all images. Place the result in XZ_person.
  3. Find the left-singular vectors of XZ_person (i.e., the matrix U in the SVD). Place the result in U_person.

At the end of this process, the columns of U_person should represent the eigenfaces.

We present the first 15 eigenfaces, along with the "average" face.

We can try get a feeling now for what the eigenfaces represent. We reconstruct XZ_person based off the first 2 eigenfaces.

We do the same for the 3rd and 4th eigenfaces.

It is easy to see the eigances are representing different lighting aspects of the images.

We now see the effect of taking not just one person, but all the people in the dataset and repeating this procedure.

We now try to test the effect of the eigenfaces in a different way.
For each eigenface ui we show the image M + t$\cdot$ui, where t is in $\big[-\frac{\sigma_i}{20},\frac{\sigma_i}{20}\big]$.
The list I indicates which eigenfaces to examine.

Finally, we would like to see whether the eigenfaces can be useful for clustering the face images. We will only focus on binary clustering here.

The list in PIS contains exactly two indexes of the two people we want to test clustering on.
The list in PCS contains exactly two indexes, for the principal components to be used.

We create a data matrix X_bin that contains all images from these two people.
Next, we use the pc_transform function to project on the two principal components listed in PCS,
and the result is placed in the matrix Y_bin.

These principal components will be plotted, coloured according to the person.

Some principal components are much more effective at clustering than others, as shown above.