From Page Views to Economy

I should have probably started with such high level analysis long time ago. Let’s have a look at the hourly activity on Wikipedia. Regardless of the pages that are views, I will cound only total number of page views for given language.

Of course there is a natural pattern here — low activity at night, rising in the morining, often very high in the evening and then rapidly falling at night:

Typical page view pattern on Wikipedia

This chart looks similar for other languages (of course it’s shifted according to time zone). Have a look at the top 20 languages here:

Page view patterns in different languages on Wikipedia

but I also found something interesting. In some languages the morning peak is higher than the evening (in German, French), whereas in other versions evening tends to be the most active part of the day. Compare the German and Polish usage:

Page view patterns on German and Polish Wikipedia

I wondered if this information is somehow related to economy (and innovation). The idea is that the more developed economy, the more people have computers at work and more likely they have a chance to use Wikipedia (at least occasionally) during the working hours.

So, let’s define an ‘development index’:  number of page views in the morning/afternoon
divided by the number page views in the evening. If it’s greater than 1 it means that more people are reading Wikipedia at work (and more people have
access to computer at work). Values less than 1 means that evening peak is higher
then the mid-day activity.

Development index idea (page views durign work hours compared to evening hours)

Here are couple of histograms of daily values of that index. As you can see there are some differences between languages:

Development Index (page view pattern) in different languages

When we compute the median of that index for working days in October 2014
we get the following results:

    language median_wo_ev
1     Danish    1.4644229
2    Swedish    1.1960350
3     German    1.1201231
4      Dutch    1.0835982
5    Finnish    1.0086898
6  Bulgarian    0.9856515
7   Estonian    0.9601560
8   Croatian    0.9589306
9     Hebrew    0.9440924
10 Slovenian    0.9340174
11    Slovak    0.8924316
12     Greek    0.8602185
13     Czech    0.8599650
14   Italian    0.8527528
15   Russian    0.8389076
16   Turkish    0.8227695
17    Polish    0.7927831
18  Romanian    0.7377419
19 Ukrainian    0.7046982

Generally, it meets my expectations. Rich countries from Western Europe (old EU) tend to have higher values of the index, new EU member states and other Eastern Europe countries share lower values. Of course the method is limited only to languages that are used locally in one country.

Let’s compare now this index with GDP per capita. For those two measures correlation coefficient is quite high:

0.7290358

And here is the index value plotted against GDP per capita (from CIA factbook: GDP – per capita (PPP)):

Page views pattern vs. GDP

With some exceptions you can see that generally the higher GDP per capita, the higher number of page views on Wikipedia in the morning when compared to the evening activity.

I think that’s interesting, because were are able to estimate some economics figures just using page view patterns.

Cocktails

This time I will check the popularity of various cocktails on Wikipedia. I will analyse only English content and try to discover which drinks become popular on a particular time of the year.

What do people drink?

There is a Wikipedia page that lists IBA Officail Cocktails and that was the starting point. I focused only on the articles linked from that list.

When we look at the total number of page views from last year we will see that the top 3 cocktails are:

  • Martini
  • Old fashioned
  • Mojito

For me Old fashioned on second position was a surprise. Here is the top-cocktails chart:

Top cocktails on Wikipedia

When do people drink?

Or at least: when do people think about drinks? One surprise is that in 2013 the page views were much higher that in 2014. This is can be easily noticed almost in every analysed article.

When we look at the popularity of individual cocktails during the last two years we can see that some drinks have their own popularity patterns. Mojito, for example, seems to be more popular in June/July whereas Martini and Old fashioned in the winter:

Top 3 cocktals on Wikipedia

 Which cocktails are becoming more popular this year?

Although the number of page views are generally decreasing this year, there are some exception. For example Moscow Mule (along with Negroni and Sidecar) have slightly higher number of readers that they had a year ago the same time. Look at the values for October:

Cocktails that become more popular this year (Wikipedia page views)

Some cocktails become very popular in short period of time because of certain events. Caipirinha, a drink based on Brazil’s most common liquor is a good example. We can see  similar number of page views during the year with significant boost in June/July, probably connected with 2014 FIFA World Cup (which took place in Brazil):

Caipirinha popularity during FIFA World Cup (Wikipedia page views)

New Year’s Eve

This pattern is not a surprise. New Year’s Eve means higher number of page views followed by a decrease in the beginning of January. However, there are some exception to this rule. Moscow Mule for example had it’s peak few days before Xmas. On the other hand, Grasshopper gained some attention in January. Also, notice the Blood Mary popularity on January 1st. It’s slightly less popular than on December 31, but still outstanding. Probably people considered that cocktail as a cure to hangover.

Cocktails popularty durign Christmas / New Year (Wikipedia page views)

Ebola

In the last few months Ebola outbreak in Africa attracted attention of the whole world. In this blog post I will check which countries where most influenced by this news. I will also try to identify any particular events that were most important for people living around the globe. Everything based on Wikipedia page views.

Outbreak

According to Wikipedia, Ebola cases and deaths occurred mainly in Liberia, Sierra Leone and Guinea. Started with few hundreds of cases by the end of March 2014, now the total number of cases diagnosed so far reached 13000.

Number of Ebola cases in 2014 (source Wikipedia)

In October more than 1500 of Ebola related deaths were reported. The figures were rapidly growing in the last few months:

Ebola deaths per month in 2014 (source Wikipedia)

Reaction

Trends of different languages share some similarities. We can see two moments of increased activity. The first was at the beginning of August and the second in October. It seems there were two moments that caused increased activity on Wikipedia.

Page views of Ebola virus disease article on Wikipedia

On August 2 Dr. Kent Brantly, American aid worker, was evacuated from Liberia after diagnosing Ebola virus disease. Few days later second American were evacuated.

On October 8, the first victim of Ebola Virus Disease in the U.S. died and on October 10 in Spain the dog called Excalibur were euthanized by court order because it was suspected that the animal may had been reservoir of Ebola virus. I would assume that this event could cause more page views in Europe (Spanish version).

Here we have compared different languages, ordered by number of page views:

Page views of Ebola virus disease on Wikipedia (different languages)

Some observations

For some languages the first peak was much more significant than the second (French, Russian, Hebrew and Portuguese). On the other hand there were couple of languages that had much higher number of page views in October, among others: Arabic, Persian, Indonesian and Romanian.

Looking at the page view numbers can be misleading as some languages are wider used than others. In the following chart I used article popularity which is number of page views divided by total number of page views (for that language). This approach however can also be far from perfect, because in preprocessed monthly logs that I use don’t include articles with less than 5 views in month. So it can influence total number of page views, and it’s hard to estimate that effect.

Here are the the same results but showing article popularity instead of absolute values.

Popularity of Ebola virus disease on Wikipedia (differente languages)

The article related to Ebola was every 50th article viewed on Thai and Greek Wikipedia in the peak time. Those two are relatively small Wikis: Thais has about 1.5 million PV/day and Greek 0.7 million PV/day. Were those two countries most preoccupied by Ebola virus disease?

Some other strange observation. Let’s compare PV for Czech, Greek and Polish version of that article:

Page views of Ebola virus disease on Wikipedia (Czech, Greek and Polish)

According to Wikipedia there are:

  • around 44 million people speaking Polish,
  • 11.5 million people speaking Czech,
  • 15 million people speaking Greek

So it seems strange, that those three languages have similar number of page views, but  there are huge differences in population sizes.

Why people in Czech Republic were so much concerned about Ebola virus disease?