The solution for HW2 – Q3 was missing from the solutions. Here it is:

Answer 1

td =

0.2500 0.5300 0.7500
0.7300 0.5000 0.2300

To find the first term of the svd, we find the eigen vectors of td * td’
td * td' =

0.9059 0.6200
0.6200 0.8358

Eigen vectors of td * td’
-0.7268
-0.6869

And

0.6869
-0.7268

The eigen values are 1.4918 and 0.2499

[Note: these might be negative also, that is a valid solution too]

Therefore, the first term of the svd is

-0.7268 0.6869
-0.6869 -0.7268

The second term of the svd is the square root of the eigen values found above

1.2214 0 0
0 0.4999 0

To find the third term of the svd, we find the eigen vectors of td’ * td

Td’ * td =
0.5954 0.4975 0.3554
0.4975 0.5309 0.5125
0.3554 0.5125 0.6154

Eigen vectors are
First eigen vector:
0.5593
0.5965
0.5756

Second eigen vector:
-0.7179
0.0013
0.6962

Third eigen vector:
-0.4146
0.8026
-0.4290

Therefore the third and last term in the svd is
-0.5593 0.7179 -0.4146
-0.5965 -0.0013 0.8026
-0.5756 -0.6962 -0.4290

Answer 2

After removing the lesser of the two eigen values, your s matrix becomes:

1.2214 0 0
0 0 0

Then to recreate the matrix, you multiply u * s * v’

0.4965 0.5296 0.5110
0.4692 0.5005 0.4829

Does it look close to the original? In my opinion, NO, it does not. We did eliminate a significant part of the variance.

(Also accept a YES answer if it claims that if you scale it properly, you would end up sort of where the original documents were.)

Answer 3

The query vector is q = [1, 0]

In the factor space, it would be q * tf

qf = -0.7268 -0.6869

The documents in the factor space are:

D1: -0.6831 0.3589
D2: -0.7286 -0.0006
D3: -0.7031 -0.3481

The similarities are:
sim(q,D1) = 0.3239
sim(q,D2) = 0.7274
sim(q,D3) = 0.9561

Answer 4

Before the transformation:

After the transformation

Clearly, after the transformation, the documents are easily separable using only one axis (the y-axis). So we can get rid of the x-axis and still differentiate between them

Answer 5

The new values added is a linear combination of the previous keywords, (0.5 * k1 + 0.5 * k2).

It will have absolutely no impact upon the calculations at all

It tells us that svd manages to find the real dimensionality of the data

Thanks and Regards,
Sushovan De

Homework 2 - Solution for Question 3

Answer 1

Answer 2

Answer 3

Answer 4

Answer 5