Postgresql source plugin throws mysterious error message when I click Validate

Description

Scenario 1

(I didn’t enter any credentials)

1. I configure a Postgresql source with the following Import Query:
SELECT * FROM
arguments
2. Click Get Schema. The plugin correctly shows the schema for the arguments table.
3. I delete the schema because I forgot the WHERE clause. (I used the trash can and not Actions>Clear)
4. I add the WHERE clause to the Import Query:
SELECT * FROM
arguments
WHERE timestamp > ${last_run_time} AND timestamp < ${logicalStartTime()}
5. I click Get Schema. Nothing happens. I don't get the schema and there's no error message.
6. I click Validate and get this message:
Error encountered while configuring the stage: 'Unable to parse schema '""'. Reason: Use JsonReader.setLenient(true) to accept malformed JSON at line 1 column 2'

Scenario 2

(I didn’t enter any credentials.)

Also, if I configure a new Postgresql source plugin and add the Import Query with the WHERE clause and click Get Schema, nothing happens. No error message and no output schema.

SELECT * FROM
arguments
WHERE timestamp > ${last_run_time} AND timestamp < ${logicalStartTime()}

I click Validate and get no errors found, but I don’t get the schema.

Observations
1. I expected Get Schema to either work or throw an error.
2. The error message thrown by Validate doesn't tell me what the problem is. I don't know what the error message means.
3. I think the WHERE clause should have validated.

Note: This also happens with the Database source plugin.

Release Notes

None

Activity

Show:
Bhooshan Mogal
April 21, 2021, 11:01 PM

I like the approach in #1. It would also need the columns field right? I’ll open a separate Jira for #1. We’ll track #2 in this jira.

Albert Shau
April 21, 2021, 10:56 PM
  1. it’s possible to get the raw query, but sql parsing is tricky. I would not recommend it. Instead, I think the DB source(s) should really have a ‘basic’ and an ‘advanced’ mode. In ‘basic’ mode, we would not ask for a query and would instead have separate fields for database name, table name, filter, split field, etc (much like the BQ source. This actually more natural when we look at the connection → pipeline use case, where the user has specified a database and table name, but not a query anywhere). In ‘advanced’ mode we would have the option to specify a query, but if you happen to give a complex one with macros then we may not be able to generate a schema.

  2. This should be straightforward to find and fix given the reproduction steps. I think it means the schema sent to the backend is an empty string and not a null, which the backend probably isn’t checking for.

Bhooshan Mogal
April 21, 2021, 10:30 PM
Edited

There are a few things going on here so I’ll try to summarize the improvements:

  1. The Get Schema/Validate functionality should be smart enough to perform its work (get schema/validate) when the macros in the import query are in the WHERE clause, not in the select clause.

  2. The JSON parsing error should be fixed.

do you know what these would take?

Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Unassigned

Reporter

Robin Rielley

Labels