Use Your Words

Zipf’s law states that given a large sample of words used, the frequency of any word is inversely proportional to its rank in the frequency table. So word number n has a frequency proportional to 1/n.

So, the most frequent word will occur about twice as often as the second most frequent word, three times as often as the third most frequent word, and so on until the least frequent word. [1] & [2] The law is named after the American linguist George Kingsley Zipf, who was the first who tried to explain it in 1935.

I came across a paper by Wang Dahui et al. [3], that compared English, traditional, and modern Chinese which will give me a baseline to train my chat-bot in terms of word selection and memory. I wonder if Vietnamese is applicable? but that is another topic that I have to put on my ‘to do’ list.


[1] Zipf’s Law


[2] Short and long words context


[3] https://www.sciencedirect.com/science/article/pii/S0378437105004085