Page tree
Skip to end of metadata
Go to start of metadata

Services can be used for ingest and egress of data. In current CDAP (3.2.0), however, there are limitations to what you can do:

  • Every method call of a service handler is executed in a transaction. The typical transaction timeout is configured at around 30 seconds. That means, if the handler methods needs longer than that to complete, the transaction will fail.
  • The content of the HTTP request is always buffered up in memory, hence the handler cannot receive large data. It would be better to stream the content. 
  • In case of transaction conflicts, the handler has no control over handling that error. 

Here are some use cases where these limitations get in the way:

  1. A service handler to upload partitions to a partitioned file set:
    • With each request, a large file is received. 
    • Meta data about the file is received in the HTTP headers
    • Based on the meta data, the handler determines the partition key for the file
    • The content of the request is consumed and streamed to a file
    • The handler validates the file (possible using a checksum, or validating its size or number of records)
    • The handler may also parse the content as it is streamed and validate it using lookups in a dataset. 
    • The handler registers the file as a new partition
    • If an error occurs in any of these steps, the file must be deleted, or moved to a quarantine area; possibly a record of the error needs to be saved to a dataset
    • If there is a transaction conflict, the same applies. 
    • Also, in case of an error, the handler has control over the HTTP response
  2. A service handler to download large files:
    • Similar to 1., with the exception that this is simpler because no writes happen (and no conflicts) 
    • Also, the request is small but the response may be very large and take a long time to send.
  3. A handler to receive a sequence of records, and to process them one by one
    • Processing a record may mean storing it in a dataset, or lookup in a dataset
    • The response may indicate how many records were successfully processed (some may have conflicts)
    • The response may contain a new record for every record received.
    • The processing should continue in case of an error (even a transaction conflict). 
    • Possibly each record must be processed in its own transaction 


 

 

 

  • No labels