Microsoft BI access Hive Tables on Hortonworks (Hadoop) Sandbox

Want to learn and play with some Hadoop systems on your local machine without installing Linux? You can use Oracle Virtual Box to install free Hadoop distribution sandbox, such Hortonworks HDP, Cloudera CDH and MapR. In this tutorial, I will present how to install Hortonworks HDP Sandbox on Windows and use Microsoft BI software to access Hive tables hosting on the Sandbox.

Pre-requisites

The following software is required for this tutorial.

After you have installed Power BI Desktop, Office Excel and Oracle Virtual Box and downloaded the HDP Sandbox, you can go to the next step: install HDP Sandbox on Virtual Box.

Hortonwork HDP SandBox on Virtual Box

Use “Import Appliance” on virtual box to follow the prompt windows to import the HDP SandBox ova file.3

virtualbox_next_windows

The detailed instruction of importing Hortworks HDP Sandbox into virtual box can be found @http://hortonworks.com/hadoop-tutorial/hortonworks-sandbox-guide/#section_2

After the import of sandbox succeeded, you should see something showing on your VirtualBox Manager screen like following screen shot.

0

Double click “Hortonworks Docker Sandbox” to start the virtual machine. After it was booted successfully, you should see something like the following screen shot.

1

If you would like, you can click Alt+F5 to login into the CentOS Sandbox by using username of “root” and password of “hadoop”, like the following screen shots.

2

4

For the first time user of HDP Sandbox 2.5, you need to reset the ambari admin password so that you can login admbari via a web browser. The screen shot below shows the commends. You can follow the detailed instructions on Hortonworks here.

5

After ambari password was re-set and the service was started successfully, you can type “http://127.0.0.1:8080” on your web browser to launch ambari login page as following screen shot.

14

Enter the username of “admin” and the password you just reset with the commend in the CentOS terminal, you can login the dashboard panel of ambari, like following screen shot.

15

You can open “Hive View” by clicking the menu on the top right-hand side shown in the following screen shot.

16

When the Hive View is open, you can play it around. The tab menu shown on the top (highlighted with red color underline) will be useful for working on the Hive.

17

So far, your HDP Sandbox is setup and running. You can load some data file (such as csv) into Hive if you like.

ODBC Setup on Windows

Basically, Microsoft BI tools need ODBC pipeline to access Hive tables. So, you need to download and install Hortonwork ODBC driver for Hive. The current version is v1.5.1 with 32-bit and b4-bit editions.

Caution: The ODBC driver needs to match your Excel or Power BI 32-bit or 64-bit version. Otherwise, you can not setup data connection properly.

You can find the ODBC driver download page @ http://hortonworks.com/downloads/ like following screen shot.

00

After downloading and installing ODBC driver on you windows, you can set your ODBC DSN as following screen shots.

9

For 64-bit ODBC driver, the screen shot will be like screen shot below.

10-1

For 32-bit ODBC driver, the screen shot will be like screen shot below.

10

The Advanced Options screen shot:

11

The Temporary Table Configuration screen shot:

12

And the Logging Options screen shot:

13

By now, you have finished HDP Sandbox and ODBC setup. Next, you can access your Hive tables by using Excel or Power BI.

Microsoft Excel BI

On Excel, setup data connection to Hive table by click “New Query” shown in the following screen shot.

1

Then choose “From ODBC” shown in the following screen shot.

2

And then enter the server ip “127.0.0.1”.

3

And choose the DSN for Hortonwork Hive tables that you just setup.

6

Choose the Hive tables you want to load like following screen shot.

7

And edit (if you need to) and load the table data into Excel. Now, you can create BI visualization based on the load Hive data then.

8

Microsoft Power BI

The steps for Power BI to access Hive tables are quite similar to the Excel ones.

Click “Get Data” on robin and choose “Other” and then “ODBC” like the following screen shot.

1

Enter the server ip “127.0.0.1”

2

And choose ODBC DSN you created.

5

Choose the Hive tables you want to load and then edit & load

6

The Hive data should be loaded into Power BI and is ready for you to report on. The screen shot below shows an example of Hive table data and its visualization.

7

One of tutorial on Power BI accessing Hortonwork Sandbox can be found @ https://community.hortonworks.com/articles/61185/visualizing-hive-data-using-microsoft-power-bi.html

Leave a comment