The solution for HW2 – Q3 was missing from the solutions. Here it is: Answer 1td = 0.2500 0.5300 0.7500 To find the first term of the svd, we find the eigen vectors of td * td’ 0.9059 0.6200 <Eigen vector calculation here> Eigen vectors of td * td’ And 0.6869 The eigen values are 1.4918 and 0.2499 [Note: these might be negative also, that is a valid solution too] Therefore, the first term of the svd is -0.7268 0.6869 The second term of the svd is the square root of the eigen values found above 1.2214 0 0 To find the third term of the svd, we find the eigen vectors of td’ * td Td’ * td = <eigen computation> Eigen vectors are Second eigen vector: Therefore the third and last term in the svd is <But we did not find the right signs for the matrix here, so we need to find this matrix in a different way, (see recent mail sent by Dr Rao)> Answer 2After removing the lesser of the two eigen values, your s matrix becomes: 1.2214 0 0 Then to recreate the matrix, you multiply u * s * v’ 0.4965 0.5296 0.5110 Does it look close to the original? In my opinion, NO, it does not. We did eliminate a significant part of the variance. (Also accept a YES answer if it claims that if you scale it properly, you would end up sort of where the original documents were.) Answer 3The query vector is q = [1, 0] In the factor space, it would be q * tf qf = -0.7268 -0.6869 The documents in the factor space are: D1: -0.6831 0.3589 The similarities are: Answer 4Before the transformation: After the transformation Clearly, after the transformation, the documents are easily separable using only one axis (the y-axis). So we can get rid of the x-axis and still differentiate between them Answer 5The new values added is a linear combination of the previous keywords, (0.5 * k1 + 0.5 * k2). It will have absolutely no impact upon the calculations at all It tells us that svd manages to find the real dimensionality of the data Thanks and Regards, |