/
Streaming HTTP handlers
Streaming HTTP handlers
Services can be used for ingest and egress of data. In current CDAP (3.2.0), however, there are limitations to what you can do:
- Every method call of a service handler is executed in a transaction. The typical transaction timeout is configured at around 30 seconds. That means, if the handler methods needs longer than that to complete, the transaction will fail.
- The content of the HTTP request is always buffered up in memory, hence the handler cannot receive large data. It would be better to stream the content.
- In case of transaction conflicts, the handler has no control over handling that error.
Here are some use cases where these limitations get in the way:
- A service handler to upload partitions to a partitioned file set:
- With each request, a large file is received.
- Meta data about the file is received in the HTTP headers
- Based on the meta data, the handler determines the partition key for the file
- The content of the request is consumed and streamed to a file
- The handler validates the file (possible using a checksum, or validating its size or number of records)
- The handler may also parse the content as it is streamed and validate it using lookups in a dataset.
- The handler registers the file as a new partition
- If an error occurs in any of these steps, the file must be deleted, or moved to a quarantine area; possibly a record of the error needs to be saved to a dataset
- If there is a transaction conflict, the same applies.
- Also, in case of an error, the handler has control over the HTTP response
- A service handler to download large files:
- Similar to 1., with the exception that this is simpler because no writes happen (and no conflicts)
- Also, the request is small but the response may be very large and take a long time to send.
- A handler to receive a sequence of records, and to process them one by one
- Processing a record may mean storing it in a dataset, or lookup in a dataset
- The response may indicate how many records were successfully processed (some may have conflicts)
- The response may contain a new record for every record received.
- The processing should continue in case of an error (even a transaction conflict).
- Possibly each record must be processed in its own transaction
, multiple selections available,
Related content
Realtime CDAP Stream Source
Realtime CDAP Stream Source
More like this
CDAP Abstractions
CDAP Abstractions
More like this
Client Resiliency (RS-002)
Client Resiliency (RS-002)
More like this
Decoupling CDAP System Storage from Hadoop
Decoupling CDAP System Storage from Hadoop
More like this
Services (Developer)
Services (Developer)
More like this
Compute Cloud Support
Compute Cloud Support
More like this
Created in 2020 by Google Inc.