I didn't realize that the standard heavy-tailed distribution is the Pareto Distribution.  It's a Heavy-tailed distribution which follows the Power Law.  It is commonly used to describe wealth distributions.
Update (10/25/10): Another distribution I didn't know about is the negative multinomial distribution.  Seems like I would have come across this with all of my work in text classification.  Then again, length never seemed to be that useful in determining classification (esp. compared to words).  And, I bet even the single failure negative multinomial had too-light of a tail to model actual document lengths accurately.
 

 
 Posts
Posts
 
 
No comments:
Post a Comment