By using the current approach we cannot exclude the temporary table creation process, because the
AbstractBigQueryInputFormat class that we are extending in the PartitionedBigQueryInputFormat does not support adding filters. Therefore, we are forced to create temporary tables when using filters or querying partitioned tables.
Starting from Hadoop Connector v.2.0.0 there was a new approach implemented and a new InputFormat was introduced named
The DirectBigQueryInputFormat allows us to query the data with filters and without creating Temporary Tables. Reference: https://github.com/GoogleCloudDataproc/hadoop-connectors/releases, https://cloud.google.com/bigquery/docs/reference/storage/.
We found out that by using the DirectBigQueryInputFormat the GCS bucket was not needed or used anymore. As we know, currently whenever we query data from BigQuery, the data goes to a temporary google cloud bucket during this process.
Reference: https://cloud.google.com/bigquery/docs/reference/storage/ .