[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Homework 2 - Solution for Question 3



The solution for HW2 – Q3 was missing from the solutions. Here it is:

Answer 1

td =

0.2500 0.5300 0.7500
0.7300 0.5000 0.2300

 To find the first term of the svd, we find the eigen vectors of td * td’
td * td' =

0.9059 0.6200
0.6200 0.8358

 <Eigen vector calculation here>

 Eigen vectors of td * td’
-0.7268

-0.6869

And

0.6869
-0.7268

The eigen values are 1.4918 and 0.2499

[Note: these might be negative also, that is a valid solution too]

Therefore, the first term of the svd is

-0.7268 0.6869
-0.6869 -0.7268

The second term of the svd is the square root of the eigen values found above

1.2214 0 0
0 0.4999 0

To find the third term of the svd, we find the eigen vectors of td’ * td

Td’ * td =
0.5954 0.4975 0.3554

0.4975 0.5309 0.5125
0.3554 0.5125 0.6154

<eigen computation>

Eigen vectors are
First eigen vector:
0.5593

0.5965
0.5756

Second eigen vector:
-0.7179

0.0013
0.6962

Third eigen vector:
-0.4146

0.8026
-0.4290

Therefore the third and last term in the svd is
-0.5593 0.7179 -0.4146

-0.5965 -0.0013 0.8026
-0.5756 -0.6962 -0.4290

 <But we did not find the right signs for the matrix here, so we need to find this matrix in a different way, (see recent mail sent by Dr Rao)>

 Answer 2

After removing the lesser of the two eigen values, your s matrix becomes:

1.2214 0 0
0      0 0

 

Then to recreate the matrix, you multiply u * s * v’

0.4965 0.5296 0.5110
0.4692 0.5005 0.4829

Does it look close to the original? In my opinion, NO, it does not. We did eliminate a significant part of the variance.

(Also accept a YES answer if it claims that if you scale it properly, you would end up sort of where the original documents were.)

Answer 3

The query vector is q = [1, 0]

In the factor space, it would be q * tf

qf = -0.7268 -0.6869

The documents in the factor space are:

D1: -0.6831 0.3589
D2: -0.7286 -0.0006
D3: -0.7031 -0.3481

The similarities are:
sim(q,D1) = 0.3239

sim(q,D2) = 0.7274
sim(q,D3) = 0.9561

Answer 4

Before the transformation:

1

 

After the transformation

2

 

Clearly, after the transformation, the documents are easily separable using only one axis (the y-axis). So we can get rid of the x-axis and still differentiate between them

Answer 5

The new values added is a linear combination of the previous keywords, (0.5 * k1 + 0.5 * k2).

It will have absolutely no impact upon the calculations at all

It tells us that svd manages to find the real dimensionality of the data

 

Thanks and Regards,
Sushovan De