Add the ability in TableSink to find schema.row.field case-insensitively
Description
Release Notes
Activity

Bhooshan Mogal July 29, 2015 at 10:46 PM
Merged PR: https://github.com/caskdata/cdap/pull/3366

Terence Yim July 24, 2015 at 4:24 PM
One extra note about the implementation. We actually don't need to get the row key field on every record. We only need to do it if the record schema change, which should be rare (with the current DBSource logic, it won't change at all).

Sreevatsan Raman July 24, 2015 at 3:32 PM
Assigning this for 3.1.0, seems like a small fix and is needed for a customer in 3.1 timeframe.

NitinM July 24, 2015 at 1:14 PM
+1 on Terence and Bhooshans suggestion - makes a lot of sense. Adding another transform is ooq.

Terence Yim July 24, 2015 at 9:11 AM
So, to summarize, here is what happening now:
1. DBSource
reads DB and generate StructuredRecord
with schema generated based on the (column_name, type)
as given by the ResultSetMetadata, and different DB and driver combinations can give you different stuff in terms of case.
2. We use the RecordPutTransformer
as a stage in the TableSink
to convert StructuredRecord
to Table
Put
.
2.1. During the conversion from StructuredRecord
to Put
, there is a field name, provided to the transformer through config for it to extract the value to be used as the Put
row key
2.2. It fails because the key field name provided in the config is different than the one in the StructuredRecord
schema
One solution is to add an extra optional config, say "tableConfig.rowFieldCaseSensitive", with default equals to "false". Then in RecordPutTransformer:106
, if the config value is "false", instead of calling Schema.getField(String)
, you get the list of fields by calling Schema.getFields()
and find the row field case insensitively.
Currently, field names in
StructuredRecord
are case-sensitive. Due to this, we are sometimes at the mercy of external systems. For instance, Some JDBC drivers (e.g.org.netezza.Driver
for Netezza) return all columns in upper case no matter how users created them. When we createStructuredRecord
out of theResultSetMetadata
returned by these drivers, the fields are all upper cased, which can cause a mismatch with the declared schema of aStructuredRecord
(e.g. in an ETL config json).This causes validation errors with messages that are hard to debug (e.g.
[field] not found in [StructuredRecord]
even though the[field]
is clearly present in the configuration that users supply, albeit with a mismatched case).