Zipf’s law

This paper presents an analysis of the frequency distribution of hashtag popularity in Twitter conversations.

In particular, we study the similarity of frequency distribution of hashtag popularity with respect to Zipf’s law, an empirical law referring to the phenomenon that many types of data in social sciences can be approximated with a Zipfian distribution.

Additionally, we also analyze Benford’s law, is a special case of Zipf’s law, a common pattern about the frequency distribution of leading digits.

In order to compute correctly the frequency distribution of hashtag popularity, we need to correct many spelling errors that Twitter’s users introduce.

The experiments obtained employing datasets of Twitter streams generated under controlled conditions show that Benford’s law and Zipf’s law can be used to model hash-tag frequency distribution.

Twitter is a microblogging social network launched in 2006 with 310 million active users per month and where 340 million tweets are daily generated.

However, to the best of our knowledge, there are not studies about the frequency distribution of hashtag popularity in Twitter conversations.

In this work, our goal is to analyze Twitter datasets in order to discover if the frequency of hashtags popularity follow some of the distribution laws that are very common in many types of data presented in the social sciences.

Some authors have applied Benford’s law to forensic account, where an anomalous data distribution in the first significant digits can lead to detect fraud.

Therefore, in order to test Zipf’s law on each dataset, we rank hashtags in the order from most to least relevant.

Once the frequency of every hashtag is computed, in Section 4 we analyze the distribution of these frequencies in order to test whether Zipf’s and Benford’s law are satisfied.

Auto9

https://www.aclweb.org/anthology/E17-4009.pdf