Weekend Rumblings : Oracle+Kafka – Learnings

Hello All,

This is about my weekend rumblings as I was preparing for my next presentations On Kafka + Oracle and hit many road blocks and I thought , I'd better write a post.

What I am trying to do is,

Build a vagrant box to test Oracle and Kafka + HDFS with hive integration
Oracle 18c, Hadoop & Hive & Confluent Kafka installed with Ansible Playbook
Integrated Oracle -> Kafka Connect & Hive --> HDFS
So that data/schema changes propagate to kafka topic and store in hdfs, using oracle itself as hive metastore (its causing lot of performance problems)

However, had run into many issues and able to resolve most of them successfully. Here are they. Worth for a individual posts but I will publish a full implementation video post soon.

Issue 1:- Jar Files:- Could not load jar files

ojdbc.jar is not in the configuration directories of Kafka and Hive, placed them in proper directories

[root@oracledb18c kafka-connect-jdbc]# ls -ltr
total 6140
-rw-r--r-- 1 elastic elastic 5575351 Aug 26 2016 sqlite-jdbc-3.8.11.2.jar
-rw-r--r-- 1 elastic elastic 658466 Aug 26 2016 postgresql-9.4-1206-jdbc41.jar
-rw-r--r-- 1 elastic elastic   46117 Aug 26 2016 kafka-connect-jdbc-3.0.1.jar
lrwxrwxrwx 1 root    root         40 Feb 23 19:48 mysql-connector-java.jar -> /usr/share/java/mysql-connector-java.jar
lrwxrwxrwx 1 root    root         26 Feb 24 13:26 ojdbc8.jar -> /usr/share/java/ojdbc8.jar
[root@oracledb18c kafka-connect-jdbc]# pwd
/opt/confluent/share/java/kafka-connect-jdbc

Issue 1:-Hive Metastore :- org.apache.hadoop.hive.metastore.hivemetaexception: Failed to get schema version.

Make sure MySQL Hive metastore , root/hive user login working properly, hive-site.xml contains proper settings. also the database metastore exists as same in the MySQL.

[vagrant@oracledb18c hive]$ cat /mnt/etc/hive/hive-site.xml.mysql
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://oracledb18c:3306/metastore</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive@localhost</value>
</property>

Issue 2:- Kafka Connect JDBC: org.apache.kafka.connect.errors.DataException: BigDecimal has mismatching scale value for given Decimal schema

As such Oracle has number data type not numeric or the jdbc/avro format takes data with precision, a change in table definition is required, instead just number keep it as below

CREATE TABLE users (
id number(9,0) PRIMARY KEY,
name varchar2(100),
email varchar2(200),
department varchar2(200),
modified timestamp default CURRENT_TIMESTAMP NOT NULL
);

Issue 3:- Kafka Connect JDBC:- (org.apache.kafka.connect.runtime.distributed.DistributedHerder:926) java.lang.IllegalArgumentException: Number of groups must be positive.

table.whitelist and tables , not working in oracle jdbc properties , hence turned to query format. here is my properties file

[vagrant@oracledb18c hive]$ cat /mnt/etc/oracle.properties

name=test-oracle-jdbc
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:oracle:thin:c##test/manager@0.0.0.0:1521/ORCLCDB
#table.whitelist=[C##TEST.USERS]
#tables=[C##TEST.USERS]
query= select * from users
mode=timestamp
incrementing.column.name=ID
timestamp.column.name=MODIFIED
topic.prefix=test_jdbc_

Issue 4:- Kafka Connect JDBC:-org.apache.kafka.connect.errors.ConnectException: Invalid type for incrementing column: BYTES

This is again due to Oracle Numeric data type conversion, when we keep mode=timestamp+incrementing it will have the above error, a Jira fix is in progress

Issue 5: Kafka Connect JDBC:- Invalid SQL type: sqlKind = UNINITIALIZED error is shown

oracle.properties file when used query do not keep quotes around.

[vagrant@oracledb18c hive]$ cat /mnt/etc/oracle.properties

name=test-oracle-jdbc
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:oracle:thin:c##test/manager@0.0.0.0:1521/ORCLCDB
#table.whitelist=[C##TEST.USERS]
#tables=[C##TEST.USERS]
~~query=" select * from users"~~ query=select * from users

mode=timestamp
incrementing.column.name=ID
timestamp.column.name=MODIFIED
topic.prefix=test_jdbc_

Issue 5:- Hive : Thrift Timeouts- org.apache.thrift.transport.ttransportexception: java.net.SocketTimeoutException: Read timed out (Unresolved)

Hive thrift url timing out due to below issues, I see 100% cpu for all hive thrift connections to Database.

Tasks: 181 total,   4 running, 177 sleeping,   0 stopped,   0 zombie
%Cpu(s): 51.5 us, 0.8 sy, 0.0 ni, 47.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 6808340 total,    93120 free, 2920696 used, 3794524 buff/cache
KiB Swap: 2097148 total, 2097148 free,        0 used. 1736696 avail Mem

PID USER      PR NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
20288 oracle    20   0 2542432 87336 80080 R 100.0 1.3 584:06.49 oracleORCLCDB (LOCAL=NO)

Issue 6:- Oracle : 100% CPU , JDBC Connections - 12c Adaptive Plans

I am running 18c and it seems the optimizer_adaptive_plans is set to true by default, disabled.

Issue 7:- Oracle: 100% CPU, JDBC Connections - running for all_objects view and PGA Memory Operation waits (Unresolved)

I see lot of row cache mutex x and PGA Memory Operations,

From 12.2. the latch: row cache objects event renamed as row cache mutex x

Latches are good and light weight, however due to concurrency with JDBC connect and no stats gathers at all (a fresh database) caused this issues, collected schema, fixed, dictionary stats resolved this.

For PGA Memory Operation, seems wen running normal queries with oracle clients there's no issue , however when runs with API based JDBC connections, the memory allocation and unallocation happening.

Also the all_objects query taking long time and seems to be never come out, do not grant DBA at all for the hive metastore user. It will have to query a lot of objects since grants to dba role is vast.

More read here:-https://fritshoogland.wordpress.com/2017/03/01/oracle-12-2-wait-event-pga-memory-operation/

That's it for now.

I will publish a full implementation video soon.

-Thanks

Suresh

All about Database Administration, Tips & Tricks

New Features for DBA’s

Subscribe to Posts by Email

Subscriber Count

Disclaimer

Recent Posts

Categories

Archives

Pages

Weekend Rumblings : Oracle+Kafka – Learnings

3 comments to Weekend Rumblings : Oracle+Kafka – Learnings

All about Database Administration, Tips & Tricks

New Features for DBA’s

Follow Me!!!

Subscribe to Posts by Email

Subscriber Count

Disclaimer

Recent Posts

Categories

Archives

Tags

Pages

Weekend Rumblings : Oracle+Kafka – Learnings

3 comments to Weekend Rumblings : Oracle+Kafka – Learnings