Overview - Amazon Kinesis Data Streams and separated AWS accounts
Amazon Kinesis Data Streams (hereafter referred to as KDS) is a managed data processing service designed for the real-time collection of high traffic of data and facilitating its transfer to subsequent AWS services. It is particularly suited for handling streaming data, such as logs, where order matters, making it a commonly used service for IoT data collection. For example, it can be specified as the data export destination for Amazon Monitron, which allows for predictive maintenance of industrial equipment through machine learning.
KDS serves as an intermediary between data producers and data consumers, employing this architecture for its operation.
When fully leveraging AWS Cloud, it's not uncommon to operate separate AWS accounts for purposes of distinction and cost management. In such cases, there might be a need to process streaming data collected by KDS in another AWS account.
This blog introduces the procedure for sharing a data stream with another AWS account and referencing it from AWS Lambda (specifying the Lambda function as a trigger) using the resource-based policy of Amazon Kinesis Data Streams (update as of November 2023). A data stream, in this context, is akin to a pipeline through which data flows.
The Architecture and Setup
The architecture is as follows.
Key Points for Setup
Here are the key points for setup:
- The execution role of the Lambda function on the data processing side(Account B) requires the
AWSLambdaKinesisExecutionRole
policy. - For the KDS data stream on the data producer side(Account A), the resource-based policy should specify the IAM role ARN of the execution role of the Lambda function on the data processing side as the Principal, and set the Allow Actions to include
kinesis:DescribeStream
,kinesis:DescribeStreamSummary
,kinesis:GetRecords
,kinesis:GetShardIterator
andkinesis:ListShards
.
Especially regarding the second point on resource-based policies, it's important to note that specifying kinesis:DescribeStream
might be missed when using the dialog in the management console. It needs to be manually added using the JSON editor (as of Feb. 2024, reported).
The following official documents might also be helpful:
- Sharing your data stream with another account
- Sharing access with cross-account AWS Lambda functions
Steps
- The steps involve several cross-accounts: data producer side(Account A) → data processing side(Account B) → data producer side → data processing side. Make sure not to mix up the targets.
- Everything must be in the same region; sharing is not possible across different regions (e.g., if the data stream is in us-west-2 and the Lambda function is in ap-northeast-1). If you wish to send data to a different region, consider the architecture using Amazon EventBridge mentioned in the epilogue.
Step 1) On the data producer side - Account A
1-1. Create a data stream in Amazon Kinesis (e.g., kds-sharing-example1
)
See here for the creation method (Step1: Create Data Stream).
For testing or small data volumes, the "Provisioned" capacity mode with "1" provisioned shard is sufficient.
NOTE: that while this guide assumes the creation of a new data stream, existing data streams can also be repurposed.
Step 2) On the data processing side - Account B
2-1. Create a Lambda function in AWS Lambda (e.g., kds-reader1
).
See here for the creation method (Create a Lambda function with the console).
The source code is as follows (Python 3). It simply emits the event
to Amazon CloudWatch Logs, which is sufficient for operational verification. After modifying the code source as below, click "Deploy" to deploy.
def lambda_handler(event, context):
print(event)
return True
NOTE: There's a template for testing the function with Amazon Kinesis Data Streams sample data, which can be used for testing.
2-2. Attach the AWSLambdaKinesisExecutionRole
policy to the execution role of kds-reader1
.
After viewing the details of kds-reader1
, go to "Cofiguration" > "Permissions" and click on the role name assigned to the execution role to view the role's settings (in the figure below, click on kds-reader1-role-mp67l1v2
).
Go to "Add Permission" > "Attach Policy" for the permission policy to display the list of policies to attach.
Select AWSLambdaKinesisExecutionRole
from the list of "Other Permission Policies" and then click "Add Permission".
Policy addition is complete.
Check for AWSLambdaKinesisExecutionRole
is added to the list of allowed policies as shown below.
NOTE: We assume that the role for executing Lambda functions is an IAM role that is automatically created and appended when a Lambda function is created. Existing IAM roles can also be used.
2-3. Noted the ARN of the IAM role.
This IAM role's ARN (in this step, kds-reader1-role-q6zcv9kq
) will be used on next step.
Step 3) Back on the data producer side - Account A
3-1. configure resource-based policy for kds-sharing-example1
in Amazon Kinesis
After viewing the details of kds-sharing-example1
, go to "Data stream sharing" > "Create Policy".
In Policy Details, select the Visual Editor, check "Data stream sharing throughput read Access," enter the ARN of the IAM role you wrote down earlier in "Specify Principal(s)," and click "Create Policy."
NOTE: The principal should be the ARN of the IAM role; specifying an ARN other than the IAM role's ARN (e.g., the ARN of a Lambda function or data stream) will result in an error and the policy cannot be created.
When the resource-based policy appears, click "Edit" to display the JSON editor. Here, add "kinesis:DescribeStream",
to the list of Actions as shown below. Finally, click "Save Changes".
NOTE: If the same privileges have already been added at the visual editor stage, the above editing process is not necessary.
Configuration of the resource-based policy is complete.
Check to that the ARN of the IAM role is attached to the Principal and the five permissions are attached to the Action, as shown below.
3-2. Noted the ARN of kds-sharing-example1
.
kds-sharing-example1
(KDS Stream's) ARN will be used on next step.
Step 4) Back on the data processing side - Account B
4-1. set up a trigger for kds-reader1
in AWS Lambda
After viewing the details of kds-reader1
, go to "Configuration" > "Triggers" > "Add trigger".
In the Trigger configuration, select Kinesis for "Select a source". Then set the ARN of kds-sharing-example1
to "Kinesis stream" in the displayed contents. Leave the other items as they are and click "Add".
NOTE1: When the focus moves to the text box, the message "No item" is displayed. No problem, ignore.
NOTE2: If you get an API error when clicking "Add", check the following two things (1) Resource-based policy permissions on the data stream side(Account A). In particular, make sure that kinesis:DescribeStream
is included. (2) Permissions for the Lambda function execution role. In particular, make sure that the AWSLambdaKinesisExecutionRole
policy is attached.
Trigger configuration is complete.
You can see that Kinesis has been added to the trigger (input source) of kds-reader1 as shown below.
How to check
To check, send data to the data stream (kds-sharing-example1 in this example) on the data generator side(Account A) and check the Amazon CloudWatch Logs output on the data processor side(Account B).
AWS CloudShell on the data generator side(Account A) sends data to the data stream via AWS CLI.
aws --cli-binary-format raw-in-base64-out \
kinesis put-record --stream-name kds-sharing-example1 \
--partition-key DUMMY1 \
--data '{"this_is": "test record"}'
If the result of the command execution is as follows, the data transmission has succeeded.
{
"ShardId": "shardId-000000000000",
"SequenceNumber": "49649718468451075013017298672854645152715037125279481858"
}
If the following log is confirmed in the log of kds-reader1 in CloudWatch Logs on the data processing side (Account B), the setting was successful.
{'Records': [{'kinesis': {'kinesisSchemaVersion': '1.0', 'partitionKey': 'DUMMY1', 'sequenceNumber': '49649683864473953433843496278632352517587667908684677122', 'data': 'eyJ0aGlzX2lzIjogInRlc3QgcmVjb3JkIn0=', 'approximateArrivalTimestamp': 1709100911.667}, 'eventSource': 'aws:kinesis', 'eventVersion': '1.0', 'eventID': 'shardId-000000000000:49649683864473953433843496278632352517587667908684677122', 'eventName': 'aws:kinesis:record', 'invokeIdentityArn': 'arn:aws:iam::888800008888:role/service-role/kds-reader1-role-q6zcv9kq', 'awsRegion': 'us-east-1', 'eventSourceARN': 'arn:aws:kinesis:us-east-1:999900009999:stream/kds-sharing-example1'}]}
Epilogue - Architecture with Amazon EventBridge
In this article introduced sharing data streams using Amazon Kinesis Data Streams resource-based policies. This will allow, for example, the Amazon Monitron data introduced at the beginning of this article to be used by other AWS accounts, which will give you more flexibility in the operation of your AWS account.
Other possible architectures for using Amazon Kinesis Data Streams data streams with other AWS accounts include sending them through the Amazon EventBridge event bus.
The advantage of this would be that it can be configured as a no-code and managed service. It can also be used across different regions, and although there is a fee for using Amazon EventBridge, the architecture is well worth it.
Here are some URLs to help you create this architecture.
- Amazon Kinesis stream as a source (Update information)
- Sending and receiving Amazon EventBridge events between AWS accounts
- Sending and receiving Amazon EventBridge events between AWS Regions
I personally believe that the ability to create a configuration that matches the skills you possess and the type of operation you wish to achieve is the best part of building blocks.
Not only this configuration, but it would be a good idea to TRY a configuration that is appropriate for the time, along with new features that may come out in the future!
[EoT]