Hello,
A number of you have emailed me asking why sometimes there are two
occurrences of a term in the index.
In addition to indexing the contents of a document, lucene also
indexes metadata like the title, url etc. When terms appear in these metadata,
they are indexed separately, hence some terms occur twice. I suggest you either
pick the first occurrence of the term only, or check the term.field(
) property (and only consider those terms that return
“contents”).
Thanks and
Regards, Sushovan De |