Edit wars

If you are interested in conflicts or edit wars on Wikipedia there is a must-read article that I found last time.

Authors analysed edits made on Wikipedia and they tried to identify those articles which are likely to create “conflicts” among editor. For example when there is a controversial issue described in the article editors will probably remove or rewrite some sections. Finding such heavy edited articles can identify “controversial” issues in given language/country without actually analysing text. The fact that this method is language independent makes it perfect for comparing social attitudes on certain subjects. I wish there were more examples attached to this work.

Again, very inspiring work. I have something similar on my mind.

Congress Edits

I read an article mentioning that about 8.5% of Twitter accounts are automated.

One particularly interesting example is @congressedits (https://twitter.com/congressedits) which sends tweets whenever someone edits Wikipedia content from IP addresses belonging to the US Congress.

Great way to monitor the media and to detect any potential manipulation. It seems to me such initiatives will become more popular and important in the nearest future.

Flu Season

It seems that flu season 2014-2015 has already started some time ago. I can see that few friends already have suffered from fever, cough or headache. Let’s check how flu-related activity looks on Wikipedia.

Last Influenza season (2013-2014) started in September, but generally January, February and March had the most page views. Here we have last flu season for Polish Wikipedia:

Flu season on Wikiepdia (monthly page views for 'Influenza' article on Polsh Wikipedia)

For other languages it looks similar. The other article ‘common cold’ has usually fewer page views, but in German and Italian it’s even more popular than flu:

Flu vs. Common cold page views on Wikipedia

Of course page views are not constant during month. On public holidays people are less preoccupied with their health condition. Christmas and New Year had less page views than similar days that time:

Flu and Common cold page views during Christmas/New year

Intuitively we fell that on weekends people won’t read too much about their sickness. In fact it’s Saturday when people seems to be least preoccupied by their health problems, whereas on Monday, Tuesday and Wednesday page views reach the highest values. This looks similar for other flu-related articles, not only on German Wikipedia.

Influenza page views by day of week (German Wikiedia)


Flu or common cold trends on Wikipedia are connected to pain-killers: paracetamol, ibuprofen. In the most languages people seem to read more frequently about paracetamol, but there are some exceptions to this. Germany for example, have higher number of page views for ibuprofen all the time. Portuguese and Chinese Wikipedia as well, but the gap is much smaller. In Polish Wikipedia it’s not clear — there are a lot of high spikes for ibuprofen, but besides that paracetamol had attracted more readers.

Paracetamol vs. Ibuprrofen page views on Wikipeidia

Paracetamol vs. Ibuprrofen page views on Wikipeidia

Previous season

In the end, it’s worth mentioning, that the absolute value of page views varies a lot from year to year. For example in season 2012-2013 the page views were generally much higher than in the following year:
Influeanza page views on Wikipedia (2013 - 2014)

I tried to compare these results with Centers for Disease Control and Prevention (Weekly U.S. Influenza Surveillance Report) an it seems that in 2012-2013 season the mortality rate for pneumonia and influenza was much higher (see charts in the linked document). Maybe those fatal cases got more media coverage and as a result more people were reading about flu on Wikipedia?

Hadoop memory settings

Sometimes when you come across “out of memory” errors in Hadoop the following resources may be helpful in understanding memory related parameters: