Date

Jan 1, 0001 12:00 AM

draft: true

title: “Type-token distributions beyond the Zipf law: a simple model with choice”

event:

event_url:

location: “XXVII Sitges Conference on Statistical Mechanics, Spain”

address:

street: Piazza Marina, 61

city: Palermo

region: PA

postcode: ‘90133’

country: Italy

summary: “A presentation discussing a statistical model for describing the distributions of movies by popularityn”

Talk start and end times.

End time can optionally be hidden by prefixing the line with `#`.

date: “2023-05-29T00:00:00Z”

date_end: “2022-04-27T11:00:00Z”

all_day: true

Schedule page publish date (NOT talk date).

publishDate: “2023-04-26T00:00:00Z”

authors:

Mikhail Tamm
Paul Krapivsky
Vejune Zemaityte
Ksenia Mukhina
Maximilian Schich

tags:

film popularity
type-token distributions

Is this a featured talk? (true/false)

featured: true

image: caption: focal_point: Right

- icon: twitter

icon_pack: fab

name: Follow

url: https://twitter.com/georgecushen

url_code: "" url_pdf: "" url_slides: "" url_video: ""

Markdown Slides (optional).

Associate this talk with Markdown slides.

Simply enter your slide deck’s filename without extension.

E.g. `slides = "example-slides"` references `content/slides/example-slides.md`.

Otherwise, set `slides = ""`.

slides:

Projects (optional).

Associate this post with one or more of your projects.

Simply enter your project’s folder or file name without extension.

E.g. `projects = ["internal-project"]` references `content/project/deep-learning/index.md`.

Otherwise, set `projects = []`.

projects:

CUDAN

Abstract

Zipf law is one of the very few regularities believed to permeate quantitative social sciences. It describes distribution of tokens (objects) between a given set of types (classes), and states that if types are order in descending order of popularity (number of tokens in them), then the size of a type is proportional to its rank in some negative power. This law is well established in wealth distribution (Pareto law),¹ city science (Zipf law for city sizes),² linguistics (distribution of word frequencies).³ The same law is believed to hold for the distribution of demand for cultural objects like movies, books, singles, etc., which is supposed to lead to important consequences for marketing.⁴

Using historical data for film industry we show here that it is not always the case. In fact, the distributions of movies by popularity is (apart from a small number of super-hits) well described by exponential rank-size distribution, i.e., size (popularity) decays exponentially with rank.

In order to describe this behavior we construct a following simple statistical model. Consider a set of $m$ initially empty classes (types) and sequentially add $N$ tokens to these types according to the following procedure: first, choose a subset of a types at random, then determine which of the classes in this subset currently has the largest number of tokens (if there is a draw between two or more leaders, then choose one of them at random) and add an additional token to this class.

In the limit of large number of classes $m → ∞$ and finite number of tokens per class $t = N/m$ the model is described by the infinite set of differential equations $$dC_{k}/dt = C_{k−1}^a − C_k^a,$$ where $C_k$ is the fraction of types with at most $k$ tokens in them. We show that for large $k$ and $t$ the distribution $C_k(t)$ converges to a scaling form $C(x)$ where $x = k/t$, and $C(x)$ satisfies $$(aC^{a−1} − x)C′ = 0$$ with boundary conditions $C(0) = 0, C(∞) = 1$. This equation has a solution $$C(x) = \begin{cases} (x/a)^{1/(a−1)} & \text{for }0 < X < a, \
1 & \text{for } x > a\end{cases}$$ This means that the type-token distribution has a form of a travelling wave with growing average number of tokens per class, the leading class has approximately $at$ tokens in it. Converting this limiting behavior into the form of rank-size distribution we show that for $a ≫ 1$ it converges to the exponential distribution akin to one observed in the data.

Conference website: https://sites.google.com/fmc.ub.edu/sitges-conference

W.J. Reed, Physica A, 319, 469 (2003). ↩︎
M. Barthelemy, The Structure and Dynamics of Cities: Urban Data Analysis and Theoretical Modeling, Cambridge University Press (2016). ↩︎
M. Gerlach, E.G. Altmann, New J. of Physics, 16, 113010 (2014). ↩︎
C. Anderson, The long tail: why the future of business Is selling less of more, Hyperion, New York, 2006. ↩︎