DBSource fails in HDP Cluster because the JDBC Driver class is not found when trying to submit the MR Job

Description

Created a HDP cluster and created a DBSource -> TPFS batch (MR) pipeline and it fails with the following issue:

Release Notes

Fixed a problem with Hydrator pipelines using a DBSource not working in an HDP cluster.

Activity

Show:
Todd Greenstein
August 2, 2016, 12:22 AM

Can you take a look to see if you can repro?

Russ Savage
August 2, 2016, 12:29 AM

i can repro. Happening on my test cluster with mysql jar.

Albert Shau
August 2, 2016, 12:35 AM

Looking at https://github.com/hortonworks/hadoop-release/blob/HDP-2.3.4.7-tag/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java, the code calls createConnection(), which is throwing an exception. Our plugin relies on the fact that DataDrivenETLDBInputFormat overrides the getConnection() method to intercept the call and correctly create the connection. Since the underlying DBInputFormat no longer calls it, everything breaks.

One short term hack would be to override the setConf() method, though there should be a more robust way to do this. Perhaps by copying the inputformat and related classes into the plugin jar?

Shankar Selvam
September 21, 2016, 10:22 PM
Shankar Selvam
September 30, 2016, 12:32 AM
Fixed

Assignee

Shankar Selvam

Reporter

Gokul Gunasekaran

Labels

None

Docs Impact

None

UX Impact

None

Components

Fix versions

Affects versions

Priority

Critical
Configure