From Page Views to Economy

I should have probably started with such high level analysis long time ago. Let’s have a look at the hourly activity on Wikipedia. Regardless of the pages that are views, I will cound only total number of page views for given language.

Of course there is a natural pattern here — low activity at night, rising in the morining, often very high in the evening and then rapidly falling at night:

Typical page view pattern on Wikipedia

This chart looks similar for other languages (of course it’s shifted according to time zone). Have a look at the top 20 languages here:

Page view patterns in different languages on Wikipedia

but I also found something interesting. In some languages the morning peak is higher than the evening (in German, French), whereas in other versions evening tends to be the most active part of the day. Compare the German and Polish usage:

Page view patterns on German and Polish Wikipedia

I wondered if this information is somehow related to economy (and innovation). The idea is that the more developed economy, the more people have computers at work and more likely they have a chance to use Wikipedia (at least occasionally) during the working hours.

So, let’s define an ‘development index’:  number of page views in the morning/afternoon
divided by the number page views in the evening. If it’s greater than 1 it means that more people are reading Wikipedia at work (and more people have
access to computer at work). Values less than 1 means that evening peak is higher
then the mid-day activity.

Development index idea (page views durign work hours compared to evening hours)

Here are couple of histograms of daily values of that index. As you can see there are some differences between languages:

Development Index (page view pattern) in different languages

When we compute the median of that index for working days in October 2014
we get the following results:

    language median_wo_ev
1     Danish    1.4644229
2    Swedish    1.1960350
3     German    1.1201231
4      Dutch    1.0835982
5    Finnish    1.0086898
6  Bulgarian    0.9856515
7   Estonian    0.9601560
8   Croatian    0.9589306
9     Hebrew    0.9440924
10 Slovenian    0.9340174
11    Slovak    0.8924316
12     Greek    0.8602185
13     Czech    0.8599650
14   Italian    0.8527528
15   Russian    0.8389076
16   Turkish    0.8227695
17    Polish    0.7927831
18  Romanian    0.7377419
19 Ukrainian    0.7046982

Generally, it meets my expectations. Rich countries from Western Europe (old EU) tend to have higher values of the index, new EU member states and other Eastern Europe countries share lower values. Of course the method is limited only to languages that are used locally in one country.

Let’s compare now this index with GDP per capita. For those two measures correlation coefficient is quite high:


And here is the index value plotted against GDP per capita (from CIA factbook: GDP – per capita (PPP)):

Page views pattern vs. GDP

With some exceptions you can see that generally the higher GDP per capita, the higher number of page views on Wikipedia in the morning when compared to the evening activity.

I think that’s interesting, because were are able to estimate some economics figures just using page view patterns.