[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
CSE 571: Grading and notes for LDA project
- To: undisclosed-recipients:;
- Subject: CSE 571: Grading and notes for LDA project
- From: Yuheng Hu <wonderfulhoo@gmail.com>
- Date: Mon, 17 Dec 2012 12:40:20 -0700
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=JhadfYAPIZVGoyqEQNwh+w3lOv50tVwkqpsOeK/sPYI=; b=hIZgaWB1W5lZeTF5WSw05yjg+cLO4SFzaOhoUg7qTDAKyow3JsGRVJGnvJj8DWH3Zp FdF3abX1Gl0058ypk3WwhaaiOHueG/lKQYErnRo6SEJxkq2OM6xTcmgcNJlnuTUHhQof MHEMh0TahJ/ZOetWt0BRN8A4tszpwtJAGy+7ndFAdXKLDMSOzlFdcBmqUNtu9oIlRdOu 9zNVjWzZaNYY0V+S/4grjJecL867Ijydt9KjiseH5/I+BpfZEZuqzF+wuX4cQf/dgCZI 31zzKIwHG1yvHvoD9GKOE3GpkdlOE3CWsHEjmmakOjjWalOMbtuJ2xWjAZM3exiqPc0B 6Qdg==
Hi all:
The total points for LDA project is 25 (5+10+10). The point for the
optional question is 10
The statistics on students' points:
Max: 25
Min: 22
Mean: 24.04
Stdev: 1.042
Median: 24
Most people did very well on Q1, Q2 and Q3. Here are my comments on Q3:
The length of the document definitely influences your topic
distribution and topic quality. In most cases, a shorter document
tends to have a small sparse vocabulary compared to a longer document,
say a newspaper article. Due to such sparsity, shorter document might
have more variability in its estimated topical distributions. In terms
of Twitter, it has an extremely sparse vocabulary. Besides, the
majority words in tweets are just stop words. Therefore, the
performance of LDA on twitter is not that good. Having said that, the
performance is also depend on the data you have. For example, you can
have a long sparse document and LDA may not performance well either.
Thanks.
--
Best regards,
Yuheng Hu