Date

Jan 1, 0001 12:00 AM

title: “Type-token distributions beyond the Zipf law: a simple model with choice”

location: “XXVII Sitges Conference on Statistical Mechanics, Spain”

summary: “A presentation discussing a statistical model for describing the distributions of movies by popularityn”

`#`

.date: “2023-05-29T00:00:00Z”

all_day: true

publishDate: “2023-04-26T00:00:00Z”

authors:

- Mikhail Tamm
- Paul Krapivsky
- Vejune Zemaityte
- Ksenia Mukhina
- Maximilian Schich

tags:

- film popularity
- type-token distributions

featured: true

image: caption: focal_point: Right

url_code: "" url_pdf: "" url_slides: "" url_video: ""

`slides = "example-slides"`

references `content/slides/example-slides.md`

.`slides = ""`

.`projects = ["internal-project"]`

references `content/project/deep-learning/index.md`

.`projects = []`

.projects:

- CUDAN

Zipf law is one of the very few regularities believed to permeate quantitative social sciences. It describes distribution of tokens (objects) between a given set of types (classes), and states that if types are order in descending order of popularity (number of tokens in them), then the size of a type is proportional to its rank in some negative power. This law is well established in wealth distribution (Pareto law),^{1} city science (Zipf law for city sizes),^{2} linguistics (distribution of word frequencies).^{3} The same law is believed to hold for the distribution of demand for cultural objects like movies, books, singles, etc., which is supposed to lead to important consequences for marketing.^{4}

Using historical data for film industry we show here that it is not always the case. In fact, the distributions of movies by popularity is (apart from a small number of super-hits) well described by exponential rank-size distribution, i.e., size (popularity) decays exponentially with rank.

In order to describe this behavior we construct a following simple statistical model. Consider a set of $m$ initially empty classes (types) and sequentially add $N$ tokens to these types according to the following procedure: first, choose a subset of a types at random, then determine which of the classes in this subset currently has the largest number of tokens (if there is a draw between two or more leaders, then choose one of them at random) and add an additional token to this class.

In the limit of large number of classes $m → ∞$ and finite number of tokens per class $t = N/m$ the model is described by the infinite set of differential equations
$$dC_{k}/dt = C_{k−1}^a − C_k^a,$$
where $C_k$ is the fraction of types with at most $k$ tokens in them. We show that for large $k$ and $t$ the distribution $C_k(t)$ converges to a scaling form $C(x)$ where $x = k/t$, and $C(x)$ satisfies
$$(aC^{a−1} − x)C′ = 0$$
with boundary conditions $C(0) = 0, C(∞) = 1$. This equation has a solution
$$C(x) = \begin{cases} (x/a)^{1/(a−1)} & \text{for }0 < X < a, \

1 & \text{for } x > a\end{cases}$$
This means that the type-token distribution has a form of a travelling wave with growing average number of tokens per class, the leading class has approximately $at$ tokens in it. Converting this limiting behavior into the form of rank-size distribution we show that for $a ≫ 1$ it converges to the exponential distribution akin to one observed in the data.

Conference website: https://sites.google.com/fmc.ub.edu/sitges-conference

W.J. Reed, Physica A, 319, 469 (2003). ↩︎

M. Barthelemy, The Structure and Dynamics of Cities: Urban Data Analysis and Theoretical Modeling, Cambridge University Press (2016). ↩︎

M. Gerlach, E.G. Altmann, New J. of Physics, 16, 113010 (2014). ↩︎

C. Anderson, The long tail: why the future of business Is selling less of more, Hyperion, New York, 2006. ↩︎