My own Hadoop cluster

I spend most of my time at work developing and maintaining processing on Hadoop, and I I’ve become interested in learning how different configurations impact performance. Some tests, like those related to compression and data format you can easily perform on production systems, but there are many other cases in which you would need full control over the cluster.

As a summer hobby project I decided to set up my own little Hadoop cluster. I hope it will be useful in testing various configurations and of course a lot of fun.

After a quick market research I found some old, but still good workstations:


Node 1

They have two, dual core Intel Xeon processors:

$ cat /proc/cpuinfo
model name      : Intel(R) Xeon(R) CPU            5140  @ 2.33GHz
cache size      : 4096 KB

Now it has only 4GB of RAM installed, but it can be upgraded to 16GB. Unfortunately all four slots are occupied now (4x1GB).

$ free -h
             total       used       free     shared    buffers     cached
Mem:          3.9G       1.5G       2.3G       648K        18M       322M
-/+ buffers/cache:       1.2G       2.7G
Swap:          14G         0B        14G

I chose 2TB hard drives and initially I will put one hard drive per node, but four SATA ports gives opportunity to extend the storage later.

$ sudo lshw -class disk
        product: ST2000DM001-1CH1
        vendor: Seagate
        size: 1863GiB (2TB)

Of course the number of nodes won’t be impressing either. For now, I will start with just 4 nodes.

Leave a Reply