Add the ability in TableSink to find schema.row.field case-insensitively

Description

Currently, field names in StructuredRecord are case-sensitive. Due to this, we are sometimes at the mercy of external systems. For instance, Some JDBC drivers (e.g. org.netezza.Driver for Netezza) return all columns in upper case no matter how users created them. When we create StructuredRecord out of the ResultSetMetadata returned by these drivers, the fields are all upper cased, which can cause a mismatch with the declared schema of a StructuredRecord (e.g. in an ETL config json).

This causes validation errors with messages that are hard to debug (e.g. [field] not found in [StructuredRecord] even though the [field] is clearly present in the configuration that users supply, albeit with a mismatched case).

Release Notes

None

Activity

Show:

Bhooshan Mogal July 29, 2015 at 10:46 PM

Terence Yim July 24, 2015 at 4:24 PM

One extra note about the implementation. We actually don't need to get the row key field on every record. We only need to do it if the record schema change, which should be rare (with the current DBSource logic, it won't change at all).

Sreevatsan Raman July 24, 2015 at 3:32 PM

Assigning this for 3.1.0, seems like a small fix and is needed for a customer in 3.1 timeframe.

NitinM July 24, 2015 at 1:14 PM

+1 on Terence and Bhooshans suggestion - makes a lot of sense. Adding another transform is ooq.

Terence Yim July 24, 2015 at 9:11 AM

So, to summarize, here is what happening now:

1. DBSource reads DB and generate StructuredRecord with schema generated based on the (column_name, type) as given by the ResultSetMetadata, and different DB and driver combinations can give you different stuff in terms of case.
2. We use the RecordPutTransformer as a stage in the TableSink to convert StructuredRecord to Table Put.
2.1. During the conversion from StructuredRecord to Put, there is a field name, provided to the transformer through config for it to extract the value to be used as the Put row key
2.2. It fails because the key field name provided in the config is different than the one in the StructuredRecord schema

One solution is to add an extra optional config, say "tableConfig.rowFieldCaseSensitive", with default equals to "false". Then in RecordPutTransformer:106, if the config value is "false", instead of calling Schema.getField(String), you get the list of fields by calling Schema.getFields() and find the row field case insensitively.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Components

Fix versions

Priority

Created July 23, 2015 at 11:28 PM
Updated July 30, 2015 at 9:18 PM
Resolved July 29, 2015 at 10:46 PM