Page tree
Skip to end of metadata
Go to start of metadata

Introduction

Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. The SQS plugin in CDAP will enable ETL developers to create streaming pipelines that read events from SQS queues in realtime and process them.

Use case(s)

  • As a user, I would like to create a streaming pipeline that reads events from Amazon SQS, runs some transformations and aggregations on it and joins the data with other sources, so that I can generate real-time enrichments/insights based on telemetry data in SQS.
  • A web beacon is pushing log records to SQS and I want to read these log events in real-time

User Storie(s)

  • I want to specify credentials securely as Access Key and Access ID
  • I want to also specify credentials using IAM
  • I want to specify the queue and region in SQS to read events from

Plugin Type

  • Batch Source
  • Batch Sink 
  • Real-time Source
  • Real-time Sink
  • Action
  • Post-Run Action
  • Aggregate
  • Join
  • Spark Model
  • Spark Compute

Realtime Source

This section defines properties that are configurable for this plugin. 

SectionUser Facing NameTypeDescriptionConstraintsOptional?Default
CredentialsAuthentication methodRadio buttonEither Access Credentials or IAM
NAccess Credentials
Access IDTextboxAWS Access ID. Only shown when Authentication method is Access Credentials
Y
Access KeyPasswordAWS Secret Access Key. Only shown when Authentication method is Access Credentials
Y
SQS propertiesRegionDrop downSelect from a list of available regions where your SQS queue is located

us-west-1
Queue nameTextboxSpecifies the queue name to read from


EndpointTextboxEndpoint of the SQS server to connect to. Omit this field to connect to AWS.
Yes
Delete MessagesDrop DownDelete messages from SQS queue after successfully reading.
Ntrue
Wait TimeNumberSQS Long poll wait time.

Valid values

1-20

Y10
IntervalNumber

The amount of time to wait between each poll in seconds. The plugin will wait for the duration specified
before issuing an receive message call.


Y0
Number of Messages to returnNumberMaximum number of messages to return for each API call.

Valid values

1-10

Y10

Batch Sink

This section defines properties that are configurable for this plugin. 

SectionUser Facing NameTypeDescriptionConstraintsOptional?Default
CredentialsAuthentication methodRadio buttonEither Access Credentials or IAM
NAccess Credentials
Access IDTextboxAWS Access ID. Only shown when Authentication method is Access Credentials
Y
Access KeyPasswordAWS Secret Access Key. Only shown when Authentication method is Access Credentials
Y
SQS propertiesRegionDrop downSelect from a list of available regions where your SQS queue is located

us-west-1
Queue nameTextboxSpecifies the queue name to read from


EndpointTextboxEndpoint of the SQS server to connect to. Omit this field to connect to AWS.
Yes
Message formatSelectEither CSV or JSON. Converts the structured record into a CSV or JSON to be sent to SQS

JSON
Delay secondsNumberThe length of time, in seconds, for which a specific message is delayed. Valid values: 0 to 900. 

Design / Implementation Tips

  • Tip #1
  • Tip #2

Design

Approach(s)

Properties

Security

Limitation(s)

Future Work

  • Some future work – HYDRATOR-99999
  • Another future work – HYDRATOR-99999

Test Case(s)

  • Test case #1
  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2



Table of Contents

Checklist

  • User stories documented 
  • User stories reviewed 
  • Design documented 
  • Design reviewed 
  • Feature merged 
  • Examples and guides 
  • Integration tests 
  • Documentation for feature 
  • Short video demonstrating the feature
  • No labels

9 Comments

  1. Prateek Duble Bhooshan Mogal

    1. Delay in Seconds - What is the use of this parameter?   What is the expected behaviour.  Do we have to wait this time interval for every message thats processed. 
    2. The Plugin type list as Realtime Source i am assuming its Streaming Source plugin type. Please confirm?
      1. That was a parameter that needed to be passed to the SQS client. No extra work in CDAP. It is part of the SendMessageRequest class. In fact, if the client configuration exposes any more parameters, we should expose them as Advanced configuration here too.
      2. Yes that's correct.
  2. Bhooshan Mogal Mikkin Patel


    According to AWS SQS documentation you need SQS URL to Send/Receive messages. https://docs.aws.amazon.com/sdk-for-net/v2/developer-guide/QueueURL.html#sqs-queue-url 


    Based on this instead , for the plugin we can just accecpt 1 parameter Queue URL instead of the 3 listed above i.e Region / Queue Name and Endpoint. 


  3. This for easier user experience. Users will not know what to put and in what format. The url should be constructed behind the scene by the plugin.

  4. Understood, Sure will keep it. The confusion was that additional endpoint but reviewing the API we figured out the need for it.  

  5. Bhooshan Mogal


    SQS API's do not support true streaming, we have to use the long polling mechanism for streaming.  Due to this we have added additional parameter for the Source configuration.  

  6. Bhooshan Mogal


    Forgot to mention on the IAM.  Correct me if i am wrong but i dont think any other Amazon plugins currently support IAM. Also to test this we need to set it up on AWS with a full blow cluster and need an account to do it.   Currently this is a pending item. 

    1. I think the AWS plugins support IAM, when you run on AWS. Typically how the plugins do it is to have an authentication method parameter, that can be either IAM or Access Credentials (default). See the S3 plugins for example. Albert Shau FYI