[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CSE 571: Grading and notes for LDA project



Hi all:

The total points for LDA project is 25 (5+10+10). The point for the
optional question is 10

The statistics on students' points:

Max: 25
Min: 22
Mean: 24.04
Stdev: 1.042
Median: 24

Most people did very well on Q1, Q2 and Q3. Here are my comments on Q3:

The length of the document definitely influences your topic
distribution and topic quality. In most cases, a shorter document
tends to have a small sparse vocabulary compared to a longer document,
say a newspaper article. Due to such sparsity, shorter document might
have more variability in its estimated topical distributions. In terms
of Twitter, it has an extremely sparse vocabulary. Besides, the
majority words in tweets are just stop words. Therefore, the
performance of LDA on twitter is not that good. Having said that, the
performance is also depend on the data you have. For example, you can
have a long sparse document and LDA may not performance well either.

Thanks.

--
Best regards,
Yuheng Hu