Month: October 2016

How To Get/Set Current Database Name in Hive

It is very easy to find/display current database information in Hive Command Line Interface (CLI). Just run the below command and terminal will display the current Hive database you are connected to.

hive> set hiveconf:hive.cli.print.current.db=true;
 hive (my_db)>

This value will be overwritten if you restart the Hive CLI or you open a new one. We can set it my permanently by editing the

~/.hiverc file.

restart the hive CLI after updating the file and it will show which DB you are connected to.

hive (default)> use my_db ;
 OK
 Time taken: 3.189 seconds
 hive (my_db)>

Oracle INMEMORY option for heap table

Oracle inmemory options allows users to store their data in memory. This is how we can enable inmemory option which provides faster access to table data.

SQL> show parameter inmemory_size

 NAME                                 TYPE        VALUE
 ------------------------------------ ----------- ------------------------------
 inmemory_size                        big integer 0

Here in my database inmemory  size is set to zero. We can change it by editing this parameter, restart the database after changing this parameter.

SQL> alter system set inmemory_size=100M scope=spfile;

System altered.

SQL> shutdown immediate;
 Database closed.
 Database dismounted.
 ORACLE instance shut down.
 SQL> startup open;
 ORACLE instance started.

Total System Global Area 2432696320 bytes
 Fixed Size 2927288 bytes
 Variable Size 637535560 bytes
 Database Buffers 1660944384 bytes
 Redo Buffers 13848576 bytes
 In-Memory Area 117440512 bytes
 Database mounted.
 Database opened.
 SQL> alter pluggable database all open;

Pluggable database altered.

Check the value of inmemory_size again for the database.

SQL> show parameter inmemory_size

NAME                                  TYPE        VALUE
 ------------------------------------ ----------- ------------------------------
 inmemory_size                        big integer 112M
 SQL>

Now we can see the parameter is set to 112MB and this can be used to store our table data. Now inmemory store doesn’t contain any table and this can be verified by the below command.

SQL> select count(*) from v$im_segments;

COUNT(*)
 ----------
 0

Now, pin the table into inmemory store.

SQL> alter table my_tab inmemory;

Table altered.

SQL> select count(*) from v$im_segments;

COUNT(*)
 ----------
 0

Table is stored in INMEMORY store only after we query it first time.

SQL> select count(*) from v$im_segments;

COUNT(*)
 ----------
 0

SQL> select count(*) from my_tab;

COUNT(*)
 ----------
 10000

SQL> select count(*) from v$im_segments;

COUNT(*)
 ----------
 1

Execution plan clearly show if the table is accessed from memory. So, lets get the execution plan for the above count statement.

 SQL> set autotrace traceonly explain;
SQL> select count(*) from my_tab;

Execution Plan
----------------------------------------------------------
Plan hash value: 2076930067

------------------------------------------------------------------------------
| Id  | Operation            | Name   | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |         |       1 |       1   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE         |         |       1 |          |         |
|   2 |   TABLE ACCESS INMEMORY FULL| MY_TAB |       3 |       1   (0)| 00:00:01 |
------------------------------------------------------------------------------

Virtualbox – Vagrant issue

I have  3 nodes Hadoop cluster installed in my laptop using virtualbox and Vagrant. I got the following error when i tried to start my cluster yesterday.

[root@ora-c7 hadoop-ops-course-master]# vagrant up node1 node2 node3
 The provider 'virtualbox' that was requested to back the machine
 'node1' is reporting that it isn't usable on this system. The
 reason is shown below:

VirtualBox is complaining that the kernel module is not loaded. Please
 run `VBoxManage --version` or open the VirtualBox GUI to see the error
 message which should contain instructions on how to fix this error.

Initially i thought this is an issue with vagrant and thought on reinstalling the whole cluster, but when I tried to start the virtualbox from the terminal i got a different error message asking me to recompile the Virtualbox kernel.

[root@ora-c7 hadoop-ops-course-master]# virtualbox --version
WARNING: The vboxdrv kernel module is not loaded. Either there is no module
 available for the current kernel (3.10.0-327.36.1.el7.x86_64) or it failed to
 load. Please recompile the kernel module and install it by

sudo /sbin/rcvboxdrv setup

You will not be able to start VMs until this problem is fixed.
START /usr/bin/firefox "http://download.virtualbox.org/virtualbox/5.0.28/VirtualBox-5.0-5.0.28_111378_el7-1.x86_64.rpm"

I recompiled the kernel and everything started working fine again.

[root@ora-c7 Downloads]# /sbin/rcvboxdrv setup

Stopping VirtualBox kernel modules [ OK ]
Recompiling VirtualBox kernel modules [ OK ]
Starting VirtualBox kernel modules [ OK ]
[root@ora-c7 Downloads]#

Top-N queries in oracle 12c

Oracle 12c has a new feature to select top N queries easily. This can be done by using the FETCH clause and the syntax is as follows.

[ OFFSET offset { ROW | ROWS } ]
[ FETCH { FIRST | NEXT } [ { rowcount | percent PERCENT } ]
    { ROW | ROWS } { ONLY | WITH TIES } ]

Lets create a table with 100 rows and see how this command works. We will insert a sequence of 1-100 into the table from dual using the connect by clause.

SQL> create table my_tab as
  2  select level id from dual
  3  connect by level <= 100;

Table created.


SQL> select count(*) from my_tab;

COUNT(*)
----------
 100

Now lets check how this feature now. We will sort the ids in descending order and pick the top 5/6 values from the list.

SQL> select * from my_tab
  2  order by id desc
  3  fetch first 5 rows only;

    ID
----------
    100
    99
    98
    97
    96

SQL> select * from my_tab
  2  order by id desc
  3  fetch first 6 percent rows only;

    ID
----------
    100
    99
    98
    97
    96
    95

6 rows selected.

SQL> select * from my_tab
  2  order by id desc
  3  offset 5 rows fetch next 5 percent rows only;

    ID
----------
    95
    94
    93
    92
    91

Oracle Identity Columns in 12c

Identity columns are a new feature in Oracle 12c. We need the CREATE SEQUENCE privilege for using this feature. The syntax is given below.

GENERATED
 [ ALWAYS | BY DEFAULT [ ON NULL ] ]
 AS IDENTITY

There are two options to create an identity column, As mentioned ALWAYS clause will always populate the column automatically. Any attempt to insert a value (Even NULL) will throw an error. BY DEFAULT /ON NULL  will populate only if we don’t specify the value or we try to insert a null value. Otherwise it will insert the value we specify in our insert statement.

Lets check how this feature works. At first we will create the table with ALWAYS option.

SQL> create table my_tab
 2 (
 3 name varchar2(50),
 4 id integer GENERATED ALWAYS AS IDENTITY
 5 );

Table created.

We have created the column id as IDENTITY column and which will be always populated by oracle. It wont let us insert a value/NULL in that column.

SQL> insert into my_tab (name) values ('John');

1 row created.

SQL> insert into my_tab values ('Mike', 2);
 insert into my_tab values ('Mike', 2)
 *
 ERROR at line 1:
 ORA-32795: cannot insert into a generated always identity column

SQL> insert into my_tab values ('Steve', NULL);
 insert into my_tab values ('Steve', NULL)
 *
 ERROR at line 1:
 ORA-32795: cannot insert into a generated always identity column

SQL> insert into my_tab (name) values ('Steve');

1 row created.


SQL> select * from my_tab;

NAME                                               ID
-------------------------------------------------- ----------
John                                               1
Steve                                              2

Now lets recreate the table with DEFAULT option.

SQL> create table my_tab
 2 (
 3 name varchar2(20),
 4 id integer GENERATED BY DEFAULT ON NULL AS IDENTITY
 5 );

Table created.

SQL> insert into my_tab values ('Tom', 1);

1 row created.


SQL> insert into my_tab (name) values ('James');

1 row created.

SQL> insert into my_tab values ('Nick', NULL);

1 row created.

SQL> select * from my_tab;

NAME                 ID
-------------------- ----------
Tom                  1
James                1
Nick                 2

DEFAULT clause will insert the value if we don’t specify in the INSERT statement or we try to insert a NULL value.

APPROX_COUNT_DISTINCT function in Oracle 12C

Oracle 12C has a new function APPROX_COUNT_DISTINCT which helps us to calculate the approximate number of distinct values from a table. As the name indicate it returns a approximate value and not the exact count. This is how this new feature works in Oracle 12C.

SQL> create table my_tab
2  (
3  Name varchar2(50),
4  id integer,
5  dept_no integer
6  );

Table created.

SQL> insert into my_tab
2  select ‘James’, level, mod(level, 10)
3  from dual connect by level <= 100000;

100000 rows created.

SQL> exec dbms_stats.gather_table_stats(‘<Schema_name>’,’my_tab’);

PL/SQL procedure successfully completed.

SQL> select count(distinct name), count(distinct id), count(distinct dept_no) from my_tab;

COUNT(DISTINCTNAME) COUNT(DISTINCTID) COUNT(DISTINCTDEPT_NO)
——————-                  —————–              ———————-
1                                            100000                             10

Now lets check what will be results of the APPROX_COUNT_DISTINCT against the same table and data.

SQL> select APPROX_COUNT_DISTINCT(name) Name, APPROX_COUNT_DISTINCT(id) id, APPROX_COUNT_DISTINCT(dept_no) dept_no from my_tab;

NAME             ID             DEPT_NO
———-         ———- ———-
1                       101766     10

The value is close and i don’t know what algorithm Oracle is using for this calculation but response time is very less compared to COUNT (DISTINCT) command.

SQL> select APPROX_COUNT_DISTINCT(id) id from my_tab;

Execution Plan
———————————————————-
Plan hash value: 2076930067

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 102 (0)| 00:00:01 |
| 1 | SORT AGGREGATE APPROX| | 1 | 5 | | |
| 2 | TABLE ACCESS FULL | MY_TAB | 100K| 488K| 102 (0)| 00:00:01 |
--------------------------------------------------------------------------------

Oracle is not taking this value from the metadata tables and i think it may be using a small sample and derive this vale from that sample.

DDL Logging in Oracle 12C

Oracle 12C logs the DDL statements if enabled. This can be done by enabling one parameter at session/instance level.

SQL> show parameter ddl_logging

NAME TYPE VALUE
———————————— ———– ——————————
enable_ddl_logging boolean FALSE

By default DDL logging is disabled.

SQL> alter session set enable_ddl_logging=TRUE;

Session altered.

Now, Lets run few DDL commands.

SQL> desc test_tbl;
Name Null? Type
—————————————– ——– —————————-
NAME VARCHAR2(100 CHAR)
AGE NUMBER(38)
EMP_CODE (INVISIBLE) NUMBER(38)

SQL> alter table test_tbl add dept_id number;

Table altered.

SQL> truncate table test_tbl;

Table truncated.

SQL> drop table test_tbl;

Table dropped.

SQL> alter session set enable_ddl_logging=FALSE;

Session altered.

Lets check the contents of DDL log, which is created in text as well as xml formats.

[oracle@ora-c7 log]$ pwd
/u01/app/oracle/diag/rdbms/orcl/orcl/log

[oracle@ora-c7 log]$ cat ddl_orcl.log
Thu Oct 20 21:52:32 2016
diag_adl:alter table test_tbl add dept_id number
diag_adl:truncate table test_tbl
diag_adl:drop table test_tbl
[oracle@ora-c7 log]$

This could be really useful if you want to monitor the DDL operations in your oracle database.