A number of you seem to have some questions about the authorities and hubs
computation in Project part 2. In short this is what you have to do:
Step 1: Find the top-k results from TF/IDF. Call that your “root
set”.
At this stage you have 10 documents.
Step 2: Find all the documents that the root set points to and is pointed
by. Call that your “base set”.
At this stage you will have, say, 80 documents.
Step 3: Create the adjacency matrix for these 80 documents.
The size of the adjacency matrix would be 80 x 80. You will need to
make more calls to LinkAnalysis to populate this matrix.
Step 4: Create an initial authorities vector and an initial hubs vector.
Then use the techniques from the slides to iteratively compute the next
authorities and hubs values. Remember to normalize after every iteration. Repeat
this until it converges.
You can test convergence by checking whether the sum of the squares of
the differences between the current values and the previous values is less than
some threshold you choose.
Step 5: Print out the top-N authorities and top-N hubs.
I hope this clears some of the confusion.
Thanks and
Regards, Sushovan De |