Document AI Batch Source

Document AI Batch Source

Introduction

Document AI plugin will allow users to use Document AI processors to process invoice, parse form, extract key value pair and more. User could also use this plugin to make predictions on AutoML custom models that exposed as Document AI processors.

NOTE: These plugins will incur additional cost.

https://cloud.google.com/document-ai/docs

Use case(s)

  1. As a user, I would like to parse my invoices, form/key-value-pair documents in PDF format to extract entities, with Data Fusion pipelines that orchestrate the end to end journey, from a data source (GCS) to a data sink (BigQuery).

User Storie(s)

  • As a data pipeline developer, I should be able to 

Plugin Type

Batch Source
Batch Sink 
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

Configuration

Invoice API

https://cloud.google.com/document-understanding/alpha/docs/quickstart-invoice

User Facing Name

Type

Description

Default value

Notes

User Facing Name

Type

Description

Default value

Notes

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table Parsing API

https://cloud.google.com/document-ai/docs/process-tables

User Facing Name

Type

Description

Default value

Notes

User Facing Name

Type

Description

Default value

Notes

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Form Parsing or KV API

https://cloud.google.com/document-ai/docs/process-forms

User Facing Name

Type

Description

Default value

Notes

User Facing Name

Type

Description

Default value

Notes

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Design / Implementation Tips

Design - To be filled in later

Approach(s)

Properties

Security

Limitation(s)

Future Work

Test Case(s) - To be filled in later

  • Test case #1

  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2

References

  • Documentation Links go here

Created in 2020 by Google Inc.