Bellingcat – investigative journalists

Can you discover something important about the world just by analysing the content of Youtube, Facebook or other publicly available sources? Of course you can — and bellingcat is a great example.

Last week I read about the founder of Bellingcat, Eliot Higgins, and I was amazed by both his story and the idea of creating alternative anlytical team/news agency. In times when you can often get bored or unsatisfied by traditional media coverage this initiative gives you chance to read the half-amateur, but yet very detailed news and investigations based on facts found on the net.

I hope it will become more popular and I wish this initiative attract even more contributors to create a well established news agency.

Edit wars

If you are interested in conflicts or edit wars on Wikipedia there is a must-read article that I found last time.

Authors analysed edits made on Wikipedia and they tried to identify those articles which are likely to create “conflicts” among editor. For example when there is a controversial issue described in the article editors will probably remove or rewrite some sections. Finding such heavy edited articles can identify “controversial” issues in given language/country without actually analysing text. The fact that this method is language independent makes it perfect for comparing social attitudes on certain subjects. I wish there were more examples attached to this work.

Again, very inspiring work. I have something similar on my mind.

Hadoop memory settings

Sometimes when you come across “out of memory” errors in Hadoop the following resources may be helpful in understanding memory related parameters:

Wikipedia Stats on the Net

If you are interested in Wikipedia statistics there are several pages worth visiting.

Last time I mentioned the dumps page which provides source data for all analysis (how often given page was requested). Basically, there are two kind of logs: the original, and in a compact form — maintained by Erik Zachte. The latter makes the analysis faster (logs are aggregated to daily results, however hour values are added as well in the last column).

Erik Zachte (Data Analyst at Wikipedia) presents a handful of Wikipedia visualisations and metrics. His profile page and blog are good starting point. You can find a lot of useful links there to check what other people have already done in this matter.

There is also http://stats.wikimedia.org, page that you should browse if you are looking for anything related to Wikipedia statistics. One of my favourite example from that page is http://www.wikipediatrends.com.