I should have probably started with such high level analysis long time ago. Let’s have a look at the hourly activity on Wikipedia. Regardless of the pages that are views, I will cound only total number of page views for given language.
Of course there is a natural pattern here — low activity at night, rising in the morining, often very high in the evening and then rapidly falling at night:
This chart looks similar for other languages (of course it’s shifted according to time zone). Have a look at the top 20 languages here:
but I also found something interesting. In some languages the morning peak is higher than the evening (in German, French), whereas in other versions evening tends to be the most active part of the day. Compare the German and Polish usage:
I wondered if this information is somehow related to economy (and innovation). The idea is that the more developed economy, the more people have computers at work and more likely they have a chance to use Wikipedia (at least occasionally) during the working hours.
So, let’s define an ‘development index’: number of page views in the morning/afternoon
divided by the number page views in the evening. If it’s greater than 1 it means that more people are reading Wikipedia at work (and more people have
access to computer at work). Values less than 1 means that evening peak is higher
then the mid-day activity.
Here are couple of histograms of daily values of that index. As you can see there are some differences between languages:
When we compute the median of that index for working days in October 2014
we get the following results:
language median_wo_ev 1 Danish 1.4644229 2 Swedish 1.1960350 3 German 1.1201231 4 Dutch 1.0835982 5 Finnish 1.0086898 6 Bulgarian 0.9856515 7 Estonian 0.9601560 8 Croatian 0.9589306 9 Hebrew 0.9440924 10 Slovenian 0.9340174 11 Slovak 0.8924316 12 Greek 0.8602185 13 Czech 0.8599650 14 Italian 0.8527528 15 Russian 0.8389076 16 Turkish 0.8227695 17 Polish 0.7927831 18 Romanian 0.7377419 19 Ukrainian 0.7046982
Generally, it meets my expectations. Rich countries from Western Europe (old EU) tend to have higher values of the index, new EU member states and other Eastern Europe countries share lower values. Of course the method is limited only to languages that are used locally in one country.
Let’s compare now this index with GDP per capita. For those two measures correlation coefficient is quite high:
And here is the index value plotted against GDP per capita (from CIA factbook: GDP – per capita (PPP)):
With some exceptions you can see that generally the higher GDP per capita, the higher number of page views on Wikipedia in the morning when compared to the evening activity.
I think that’s interesting, because were are able to estimate some economics figures just using page view patterns.