(5/5)

Q 1, 2 and 4 Quote please

Part 1 (8 points)

You will write a script called PCAEx5 which will work on a small dataset of 50 random points, Gaussian

distributed, in 2D. These are contained in the file pcadata.mat, available on the Moodle page. Your script

will:

1. Load the datapoints contained in the file pcadata.mat. Let us call X this initial set of datapoints. X

has size 50x2 as there are 50 points in 2 dimensions.

2. Create Figure 1. In this figure, plot the points as blue circles in 2D, in a figure whose axis are set in the

range xmin = 0, xmax=7, ymin=2, ymax=8

3. Call a function called subtractMean that you will also write and include in the submission. The

function receives a dataset (i.e. a matrix) as the only input argument, and returns two arguments: a

dataset obtained from the input dataset by subtracting its mean; and the mean of the input dataset.

Your script will run this function on your dataset X, and obtain a new dataset Xmu and the mean of X,

which we shall call mu.

4. Call a function called myPCA that you will also write and include in the submission. This function

receives a dataset (i.e. a matrix) as the only input argument, and returns two arguments: the first is a

matrix in which the columns are the principal components (i.e. the eigenvectors of the covariance

matrix of the dataset) and the second is the set of corresponding eigenvalues. The eigenvectors will

need to be ordered according to the size of their corresponding eigenvalues, in decreasing order; that

is, the first column will be the eigenvector corresponding to the largest eigenvalue, the second column

will be the eigenvector corresponding to the second largest eigenvalue, and so on.

Your script will run this function on your dataset Xmu, and obtain a matrix U of principal components

and a vector S with their corresponding eigenvalues.

[HINT: the principal components are the eigenvectors of the covariance matrix of the data. So to

implement this step you will need the function cov for calculating the covariance of the data, and the

function eig, for calculating eigenvalues and eigenvectors.]

5. Add to Figure 1 the plot of the 2 principal components, in which the first component (corresponding

to the largest eigenvalue) is red, and the second one is green. Your Figure 1 should look similar to

Figure A below. Also print out on the command window the coordinates of the top eigenvector.

[HINT: you can use the command line to draw the eigenvector. Do not forget that the eigenvectors

were calculated from data from which the mean had been subtracted. So you will need to add the

mean to the eigenvectors in order to make the plot.]

6. Call a function called projectData that you will also write and include in the submission. The

function receives 3 arguments: a dataset (i.e. a matrix), a set of eigenvectors and a positive integer k.

It provides as the only output argument a dataset obtained by projecting the input dataset onto the

first k eigenvectors. Note that the first k eigenvectors are the k eigenvectors corresponding to the k

largest eigenvalues.

Your script will run this function on your dataset Xmu, using the eigenvectors in U and with k equal to

1 and obtain a matrix Z of projected data.

[HINT: here the projection of a datapoint onto an eigenvector can be obtained by calculating the dot

product between the point and the vector, since the eigenvectors you obtain from Matlab have unit

norm].

7. Print in the command window the projection of the first 3 points in your dataset, i.e. Z(1:3, :).

8. Call a function recoverData provided on the Moodle page. The function receives 4 arguments: a

dataset (i.e. a matrix), a set of eigenvectors, a positive integer k and a vector mu. It provides as the

only output argument a dataset obtained by projecting back your points onto the original space.

Using the variable names described above the call in your script call would be:

Xrec = recoverData(Z, U, K, mu)

Note that this function will work correctly only if the eigenvectors in U are ordered according to the

size of their corresponding eigenvalues, in decreasing order, as explained in point 4.

Your script will run the above line obtaining in Xrec the recovered datapoints.

9. Create Figure 2. In this figure, plot the points as blue circles in 2D, in a figure whose axis are set in the

range xmin = 0, xmax=7, ymin=2, ymax=8. Then add the recovered points as red stars. (Using the

variable names described above your recovered points will be contained in the variable Xrec). Your

Figure 2 should look similar to the Figure B below.

Figure A

Figure B

Part 2 (7 points)

In this part of the exercise you will repeat on a larger real world dataset exactly the same analysis that you

have already performed on the small toy dataset. Your script will use the functions you have written for the

small toy dataset. Your script will work on a dataset of 5000 images of faces of famous people, taken from a

public repository. Each image is a 32x32 matrix of pixel which has been linearized into a vector of size 1024.

These images are contained in the file pcafaces.mat, available on the Moodle page.

1. Load the datapoints contained in the file pcafaces.mat. Let us call X this initial set of datapoints.

X has size 5000x1024, as there are 5000 faces, each represented by a vector of pixels of size 1024.

(This is a large dataset so it might take a few seconds, depending on your machine).

2. Create Figure 3. Use the function displayData provided on the Moodle page to display the first

100 images in the datasets. Using the variable names described above the call in your script call would

be:

displayData(X(1:100, :))

Your Figure 3 should look similar to the Figure C below.

3. Subtract the mean from X using your function subtractMean

4. Project the data onto the first 200 principal components using your function projectData

5. Recover your images back onto the original space using the function recoverData (which is

provided on the Moodle page).

6. Create Figure 4. This figure will contain 2 subplots. The first subplot will display the first 100 images of

the original data (that is, it will be the same as Figure 3). The second subplot will display the first 100

images of the reconstructed data – in this way you will be able to compare how good your

reconstruction is! Your Figure 4 should look similar to the Figure D below.

For fun, you can experiment and check the quality of the reconstructed faces for different number of principal

components…

(5/5)

DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma

Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t

Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th

1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of