Month: July 2016

How to copy multiple files from localhost to vagrant node

I have a 3 node hadoop cluster installed via vagrant in my laptop. Yesterday i wanted to move a set of files to the vagrant node from my local host, here are the steps i followed to achieve it.

firstly get the vagrant node ssh configuration, since i have it installed on the local machine i dont have the password/SSH keys set up. You will find much more access restrictions on a professional setup.

First check the options in vagrant help

>> vagrant help

you can see an option “ssh-config” along with a brief description.

ssh-config outputs OpenSSH valid configuration to connect to the machine

Now, list the ssh configuration for my nodes

>>vagrant ssh-config node1

Host node1
 HostName 127.0.0.1
 User vagrant
 Port 2222
 UserKnownHostsFile /dev/null
 StrictHostKeyChecking no
 PasswordAuthentication no
 IdentityFile /opt/dev/hdp/hadoop-ops-course-master/.vagrant/machines/node1/virtualbox/private_key
 IdentitiesOnly yes
 LogLevel FATAL

Once we have the configuration, use the below scp command to move the file, please note you have to change the parameters based on your environment.

>> scp -P 2222 -i /opt/dev/hdp/hadoop-ops-course-master/.vagrant/machines/node1/virtualbox/private_key /home/dmadavil/Downloads/names/*.txt vagrant@127.0.0.1:~

This move files from my local machine path to vagrant node1 home directory.

You may need to run the hadoop fs -put or CopyFromLocal to move the file to HDFS.

Advertisements

Hadoop home directory access issue- mkdir: Permission denied: user=vagrant, access=WRITE, inode=”/user/vagrant”:hdfs:hdfs:drwxr-xr-x

When you install hadoop for the first time on your sandbox/virtual machine you might face some access issues. Here, I had created a 3 node cluster in my laptop using vagrant and virtaulbox.

Firts, I connected to node1 using ssh

>> vagrant ssh node1

>> pwd
/home/vagrant

>>hadoop version

Hadoop 2.7.1.2.3.6.0-3796
Subversion git@github.com:hortonworks/hadoop.git -r d712e2d662051975eea2dc014c1d07a8f0ac8057
Compiled by jenkins on 2016-06-23T15:40Z
Compiled with protoc 2.5.0
From source with checksum dd27ba8f8a26d4f721313528724faf
This command was run using /usr/hdp/2.3.6.0-3796/hadoop/hadoop-common-2.7.1.2.3.6.0-3796.jar

Now I want to check the contents of my home directory in hdfs

>>hadoop fs -ls
ls: `.’: No such file or directory

I initially thought it was because I dont have a /user/vagrant directory present in hdfs (This is the default directory for user vagrant)

>>hadoop fs -mkdir -p /user/vagrant
mkdir: Permission denied: user=vagrant, access=WRITE, inode=”/user/vagrant”:hdfs:hdfs:drwxr-xr-x

This is because the /user/vagrant is owned by hdfs and we have to run the command as superuser hdfs,

>>sudo -u hdfs hadoop fs -mkdir -p /user/vagrant
>>sudo -u hdfs hadoop fs -chown vagrant /user/vagrant

Now you can see the ownership of /user/vagrant changed

>>hadoop fs -ls /user
Found 2 items
drwxrwx— – ambari-qa hdfs 0 2016-07-05 05:34 /user/ambari-qa
drwxr-xr-x – vagrant hdfs 0 2016-07-14 03:44 /user/vagrant

Lets create couple of files in the vagrant user directory

>>hadoop fs -touchz ./my_data.txt
>>hadoop fs -touchz ./my_data2.txt

Now, we can list the files in vagrant user directory

>>hadoop fs -ls .
Found 2 items
-rw-r–r– 3 vagrant hdfs 0 2016-07-14 04:27 my_data.txt
-rw-r–r– 3 vagrant hdfs 0 2016-07-14 04:27 my_data2.txt

Running system commands from python

I’m sure there are multiple ways of doing it, but the easiest way is to use the system function from os module.

import os
>>> os.system (‘ls -lrt’)
total 120
drwxrw-r–   2 ****** impvs       4096 Mar 19 2015   wallet
drwxrw-r–   3 ******impvs        4096 Mar 19 2015   oradiag_*******
-rwxrw-r–    1 ******impvs        6364 Mar 27 2015    reset_wallet.ksh
-rwxrwxrwx 1 *******impvs      484 Oct 21 2015        sqlnet.ora
-rwxrwxrwx 1 ****** impvs       58923 Apr 12 15:02  tnsnames.ora
drwxrw-r–   2 ******impvs       4096 Jun 23 16:08   python
-rw-r–r–      1 ******infdevlpr 836 Jun 30 15:00      cdc_test_log.lst
0

Zero in the last line is the error code of the command, zero means successful execution and any other value indicates an error.