Hadoop user name

Some time ago I was looking for this option:

Environmental variable HADOOP_USER_NAME lets you specify the username that will be used when connecting to Hadoop, for example to create new HDFS files or accessing existing data.

Let’s have a look at the short example:

[root@sandbox ~]# echo "New file" | hdfs dfs -put - /tmp/file_as_root
[root@sandbox ~]# export HADOOP_USER_NAME=hdfs
[root@sandbox ~]# echo "New file" | hdfs dfs -put - /tmp/file_as_hdfs
[root@sandbox ~]# hdfs dfs -ls /tmp/file_*
-rw-r--r--   3 hdfs hdfs        154 2016-05-21 08:20 /tmp/file_as_hdfs
-rw-r--r--   3 root hdfs        154 2016-05-21 08:19 /tmp/file_as_root

So the second (file_as_hdfs) is owned by hdfs user because that was the value of HADOOP_USER_NAME variable.

Of course it works only on Hadoop cluster without Kerberos, but still it’s very useful on test environment or on VM. You can act as many users without executing sudo commands all the time.