Category: Hadoop

How to resolve error “sandbox.hortonworks.com’s server DNS address could not be found” while loading Oozie Web UI

Today we will see how we can resolve the error “sandbox.hortonworks.com’s server DNS address could not be found” while loading Oozie Web UI. I got this wired error after reinstalling the HDP sandbox on Windows 10.

This error is happening because we haven’t updated the hosts file in Windows 10 platform. You can find the hosts file under the below listed directory.

C:\Windows\System32\drivers\etc

Copy the host file to desktop and then edit it, add the following line at the end of hosts file.

127.0.0.1       sandbox.hortonworks.com

Now copy back the hosts file to C:\Windows\System32\drivers\etc, You may need the admin rights if you are not a power user.

Now you will be able to access Oozie UI with the below link.

http://sandbox.hortonworks.com:11000

 

Disable Hadoop log messages in the console

If you are a beginner to Hadoop/map-reduce ecosystem, you must have seen the messages that are displayed in the console when you run commands. It could be useful for a beginner and sometimes help you to understand the functionality.  But these log outputs are annoying once you are familiarized with the working of the system or you are using a production system.

These messages can be  suppressed using Ambari. I will just list the steps you have to go through to disable them.

  1. Log into Ambari console and go to Mapreduce2 tab in the left side.

ambari_1

2. In the Mapreduce2 page, click on configs tab and then go to advanced.

ambari_2

3. There you can see “Advanced mapred-site” push down list and click on it.

ambari_3

4. Scroll down and then change the value of below two parameters to ‘OFF’

ambari_5

ambari_75. Once done, save the changes and restart the components required.

Now if you login to Hive/Pig Console you don’t see all those INFO messages when running the queries.

Import data from MySQL to hadoop using SQOOP

As a part of our job we import/move a lot of data from relational databases (Mainly from Oracle and MySQL) to Hadoop. Most of our data stores are in Oracle with a few internal data stores running on MySQL.

SQOOP (SQL for Hadoop) is an Apache tool to import data from relational databases (There are separate drivers for each database) to hadoop. Here in this blog we will try to import data from a MySQL table to Hadoop file system.

Here, I have a MySQL instance running on the local machine on which my Hadoop cluster also running. You will have to download and place the driver in appropriate directory for SQOOP to connect to that database. Drivers are already present in my machine as SQOOP offers a very extensive support for MySQL.

Below link will give you a list of available drivers and their locations if you are using a different database.

https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1773570

First let me login to the primary node in my 3 node cluster (Virtual/Created by Vagrant and VirtualBox).

vagrant ssh node1

Let us check the connection and data in the MySQL database.

mysql -u root -h localhost -p
Enter password: ********
MariaDB [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| my_test |
| mysql |
| performance_schema |
| test |
+--------------------+
5 rows in set (0.04 sec)

use my_test;

MariaDB [my_test]> show tables;
+-------------------+
| Tables_in_my_test |
+-------------------+
| name_data |
| name_data2 |
+-------------------+
2 rows in set (0.02 sec)

Now, lets check the data.

select count(*) from name_data;

MariaDB [my_test]> select count(*) from name_data;
+----------+
| count(*) |
+----------+
| 1858689 |
+----------+

MariaDB [my_test]> select * from name_data limit 3;
+------+--------+-------+
| Name | Gender | count |
+------+--------+-------+
| Mary | F | 7065 |
| Anna | F | 2604 |
| Emma | F | 2003 |
+------+--------+-------+

Now we are sure that we have data in MySQL table, lets check our HADOOP home directory.

hadoop fs -ls /user/vagrant/

[vagrant@node1 ~]$ hadoop fs -ls /user/vagrant
Found 5 items
drwx------ - vagrant hdfs 0 2016-12-20 02:56 /user/vagrant/.Trash
drwxr-xr-x - vagrant hdfs 0 2016-10-26 04:46 /user/vagrant/.hiveJars
drwx------ - vagrant hdfs 0 2016-11-13 23:44 /user/vagrant/.staging
drwxr-xr-x - vagrant hdfs 0 2016-12-06 04:13 /user/vagrant/test_files

Now we wants to move the data from MySQL to the /users/vagrant/name_data directory. Below is th sqoop command to move import data.

[vagrant@node1 ~]$ sqoop import –connect jdbc:mysql://localhost/my_test –username root –password ******* –table name_data –m 1 –target-dir /user/vagrant/my_data

Once this command is completed, data will be present in /user/vagrant/my_data.

[vagrant@node1 ~]$ hadoop fs -ls /user/vagrant/my_data
Found 2 items
-rw-r--r-- 3 vagrant hdfs 0 2016-12-20 03:20 /user/vagrant/my_data/_SUCCESS
-rw-r--r-- 3 vagrant hdfs 22125615 2016-12-20 03:20 /user/vagrant/my_data/part-m-00000

[vagrant@node1 ~]$ hadoop fs -cat /user/vagrant/my_data/part-m-00000| wc -l
1858689
[vagrant@node1 ~]$

[vagrant@node1 ~]$ hadoop fs -cat /user/vagrant/my_data/part-m-00000| head -3
Mary,F,7065
Anna,F,2604
Emma,F,2003

We can also create a config file and store the commands in it for re-usability.

[vagrant@node1 ~]$ cat sqoop_test_config.cnf
import
--connect
jdbc:mysql://localhost/my_test
--username
root

[vagrant@node1 ~]$ sqoop --options-file ./sqoop_test_config.cnf --password ***** --m 1 --table name_data --target-dir /user/vagrant/my_data

This also does the same job, but now we have the flexibility to save, edit and reuse the commands.

How To Get/Set Current Database Name in Hive

It is very easy to find/display current database information in Hive Command Line Interface (CLI). Just run the below command and terminal will display the current Hive database you are connected to.

hive> set hiveconf:hive.cli.print.current.db=true;
 hive (my_db)>

This value will be overwritten if you restart the Hive CLI or you open a new one. We can set it my permanently by editing the

~/.hiverc file.

restart the hive CLI after updating the file and it will show which DB you are connected to.

hive (default)> use my_db ;
 OK
 Time taken: 3.189 seconds
 hive (my_db)>

Virtualbox – Vagrant issue

I have  3 nodes Hadoop cluster installed in my laptop using virtualbox and Vagrant. I got the following error when i tried to start my cluster yesterday.

[root@ora-c7 hadoop-ops-course-master]# vagrant up node1 node2 node3
 The provider 'virtualbox' that was requested to back the machine
 'node1' is reporting that it isn't usable on this system. The
 reason is shown below:

VirtualBox is complaining that the kernel module is not loaded. Please
 run `VBoxManage --version` or open the VirtualBox GUI to see the error
 message which should contain instructions on how to fix this error.

Initially i thought this is an issue with vagrant and thought on reinstalling the whole cluster, but when I tried to start the virtualbox from the terminal i got a different error message asking me to recompile the Virtualbox kernel.

[root@ora-c7 hadoop-ops-course-master]# virtualbox --version
WARNING: The vboxdrv kernel module is not loaded. Either there is no module
 available for the current kernel (3.10.0-327.36.1.el7.x86_64) or it failed to
 load. Please recompile the kernel module and install it by

sudo /sbin/rcvboxdrv setup

You will not be able to start VMs until this problem is fixed.
START /usr/bin/firefox "http://download.virtualbox.org/virtualbox/5.0.28/VirtualBox-5.0-5.0.28_111378_el7-1.x86_64.rpm"

I recompiled the kernel and everything started working fine again.

[root@ora-c7 Downloads]# /sbin/rcvboxdrv setup

Stopping VirtualBox kernel modules [ OK ]
Recompiling VirtualBox kernel modules [ OK ]
Starting VirtualBox kernel modules [ OK ]
[root@ora-c7 Downloads]#

How to copy multiple files from localhost to vagrant node

I have a 3 node hadoop cluster installed via vagrant in my laptop. Yesterday i wanted to move a set of files to the vagrant node from my local host, here are the steps i followed to achieve it.

firstly get the vagrant node ssh configuration, since i have it installed on the local machine i dont have the password/SSH keys set up. You will find much more access restrictions on a professional setup.

First check the options in vagrant help

>> vagrant help

you can see an option “ssh-config” along with a brief description.

ssh-config outputs OpenSSH valid configuration to connect to the machine

Now, list the ssh configuration for my nodes

>>vagrant ssh-config node1

Host node1
 HostName 127.0.0.1
 User vagrant
 Port 2222
 UserKnownHostsFile /dev/null
 StrictHostKeyChecking no
 PasswordAuthentication no
 IdentityFile /opt/dev/hdp/hadoop-ops-course-master/.vagrant/machines/node1/virtualbox/private_key
 IdentitiesOnly yes
 LogLevel FATAL

Once we have the configuration, use the below scp command to move the file, please note you have to change the parameters based on your environment.

>> scp -P 2222 -i /opt/dev/hdp/hadoop-ops-course-master/.vagrant/machines/node1/virtualbox/private_key /home/dmadavil/Downloads/names/*.txt vagrant@127.0.0.1:~

This move files from my local machine path to vagrant node1 home directory.

You may need to run the hadoop fs -put or CopyFromLocal to move the file to HDFS.