Architecting a Notification Platform on AWS and how to Implement Personalization of Delivery Channels

Wonder Agudah - Dec 14 '24 - - Dev Community

Image description

Design Overview:

This design serves as a simple and efficient architecture for a standard notification platform that can be expanded during the lifecycle of the system, as and when extra specific, use case requirements are introduced for further improvements.

The notification platform leverages a number of AWS serverless technologies, which can be either deployed with CloudFormation or its extension for serverless applications and resources; AWS SAM.

The entry point to this system is designed with API Gateway and interfaced by AWS WAF to filter incoming requests by providing protection against common web attacks based on defined conditions. Filtered requests are then authenticated by API Gateway through Cognito for external services and via IAM for services that might be accessing the platform internally.

The design module employed for the notification platform leverages an event driven architecture(EDA)—in effect, all authenticated requests call an API Gateway endpoint that sets the system in motion by either invoking a lambda function for individual requests or routing batch requests through SQS. The notification gateway as indicated in the attached architectural diagram is responsible for this initial process of handling individual and batch requests. In the event, where real time data processing of notifications are required, Amazon Kinesis can be leveraged instead.

The preliminary requests(individual and batch) made in the notification gateway are then routed accordingly using SNS and SQS to the notification processing and distribution service where AWS Lambda functions are invoked and tasked with custom business logic to validate, format and schedule notification content, as well as to interact with a DynamoDB table that stores contact information and delivery channel preferences of users(subscribers).

After the notification distribution service is done executing, an event bus is provisioned with Amazon EventBridge to route processed notifications using event rules that push these notification events to queues designated for each delivery channel to be polled and processed in concurrency. Each
notification event will be tagged with a specific category for respective delivery channels in order to be routed to its corresponding queue. A number of Lambda functions here will be provisioned to run specific delivery jobs that poll their assigned queues and fetch notification templates from a S3 bucket
to forward notifications to integrated delivery channels configured with the subscriptions of consumers. Amazon SES will be used for delivering emails, while Amazon SNS will be used for both SMS and mobile push notifications. A fourth queue and delivery job to be customised is provided to interact with the API of any third party service channel that needs to be connected in the future. Amazon Pinpoint can be used to replace these services if campaigns and a more interactive session is required with the users to be notified.

The status of notification delivery jobs(sent/failed) as well as delivery channels(delivered/failed) will be logged into a DynamoDB table.

CloudWatch Metrics will be collected and analysed from all applicable services.

The following points are some features of the AWS services used that ensure non-functional requirements of the notification platform are met

Scalability
● Amazon S3: Provides virtually unlimited storage for logs, tracking data, and notification
templates, automatically scaling as data grows.
● Amazon SQS/SNS: Scales to handle millions of messages per second.
● AWS Lambda: Scales by creating instances for high concurrency workloads.
● Amazon DynamoDB: On-demand scaling for unpredictable workloads.
● Amazon Kinesis: Shard-based horizontal scaling for real-time data streams.

High Availability
● Amazon S3: Designed for 99.99% availability, with data replication across multiple AZs.
● SQS, SNS, DynamoDB: Multi-AZ replication ensures fault tolerance.
● AWS Lambda: Auto-retries failed executions and operates across multiple AZs.
● Amazon Pinpoint: Multi-AZ operations for continuous message delivery.

Security
The recent feature of API gateway to provide support for TLS(API Gateway now supports TLS 1.3) satisfies the requirement for data encryption in transit. Integration with Amazon Cognito will ensure authentication and authorization is maintained by verifying access for services that need to access the platform using their application users directory(user pool) or by leveraging an identity provider(identity pools) IAM entities, specifically roles can also be assigned to internal services for authentication. AWS services roles will be capitalised with specific policies to provide limited access for specific operations and API calls in a bid to strengthen the security posture of the platform.
● Encryption in Transit:
● All services, including S3, SQS, SNS, API Gateway, and Lambda, support TLS
(Transport Layer Security) to encrypt data during transmission.
● Amazon S3:
● Bucket Policies and IAM: Restrict access to sensitive data like logs and templates.
● Encryption at Rest: Uses server-side encryption (SSE-S3, SSE-KMS) or client-side
encryption.
● IAM Roles/Policies: Fine-grained access control ensures secure access to resources.
● API Gateway Security: Supports Cognito, IAM, and custom Lambda authorisers for
authentication.
● AWS WAF: Protects APIs from DDoS, SQL injection, and other threats.

Monitoring and Logging
● Amazon CloudWatch:
● Metrics: Monitors key metrics such as SQS queue depth, SNS delivery success
rates, Lambda execution times, DynamoDB throughput, and API Gateway requests.
● Alarms: Sends alerts for anomalies or performance issues (e.g., high queue backlog,
failed deliveries).
● Service Lens: Provides an integrated view of performance and availability across
services, combining metrics, logs, and traces for end-to-end observability.
● AWS X-Ray:
● Traces requests through the system to identify performance bottlenecks and
dependencies.
● Provides detailed insights into latencies, errors, and service interactions.
● CloudWatch Logs:
● Aggregates runtime logs from Lambda, API Gateway, and other services.
● Supports detailed querying for debugging and analysis.
● CloudTrail:
● Tracks all API actions for audit and compliance, ensuring a secure and traceable
system.
● Dead Letter Queues (DLQs):
● Captures failed or undeliverable messages from SQS or SNS for analysis and
troubleshooting.
● Amazon DynamoDB:
● The status of notification delivery jobs(sent/failed) as well as delivery
channels(delivered/failed) will be logged into a DynamoDB table.
● Amazon S3:
● Stores logs, archived notifications, and tracking data for long-term analysis and
compliance reporting.


The notification system can be enhanced with machine learning features using AWS ML services to optimise operations, improve user experience, and enable personalisation. Here are some ML features to include, along with the AWS services that can support them:

A. Personalization

  • ML Feature: Predict the most effective delivery channel or time for each user.
  • AWS Service: Amazon Personalize
    • Trains custom ML models to recommend delivery preferences based on user behaviour and history (E.g: channel preference, response patterns).

B. Intelligent Scheduling

  • ML Feature: Analyze user behaviour and historical engagement data to send notifications at the best time (E.g: optimal open rates).
  • AWS Service: Amazon SageMaker
    • Build models to predict engagement and dynamically schedule notifications.
    • Use Amazon Forecast for time-series predictions, such as predicting the best times for notifications.

C. Content Optimization

  • ML Feature: Automatically analyze and improve the effectiveness of notification content (E.g: subject lines, message body) for higher engagement.
  • AWS Service: Amazon Comprehend
    • Perform sentiment analysis and key phrase extraction on notification content.
    • Tailor content to match user preferences and sentiment.

D. Fraud Detection

  • ML Feature: Detect and prevent fraudulent activities or malicious notifications (E.g: spam detection or unusual activity in the system).
  • AWS Service: Amazon Fraud Detector
    • Automatically identifies anomalies in notification delivery patterns or content based on historical data.

E. Audience Segmentation

  • ML Feature: Automatically segment users based on behavior, demographics, or engagement levels for targeted notifications.
  • AWS Service: Amazon SageMaker or Amazon Pinpoint
    • Train ML models for advanced clustering or use Pinpoint's built-in segmentation features for simpler implementations.

F. Tracking and Analytics Insights

  • ML Feature: Use ML models to analyze delivery metrics and provide actionable insights, such as trends in user engagement or areas for improvement.
  • AWS Service: Amazon Lookout for Metrics
    • Automatically detects anomalies in delivery rates, open rates, or click-through rates, enabling faster troubleshooting.

The following section will provide guidelines to implement personalization by sending delivery notifications via the most effective channel for each user based on their historical preferences and behaviour.

Steps to Implement Personalization

1. Collect Data

You need a dataset to train the recommendation model. Include the following attributes:

  • User ID: Unique identifier for the user.
  • Channel Used: Delivery channel.
  • Engagement Metrics: Open rates, click rates, or read receipts.
  • Timestamp: When the notification was sent.
  • Context Data: Device type, location, time of day, etc.

Data Source:

  • Export data from Amazon Pinpoint, Amazon S3 (logs), or your database.

2. Prepare and Store Data

  • Use AWS Glue to clean and transform the raw data.
  • Store the processed data in Amazon S3 as input for ML models.
  • Use AWS Glue Data Catalog to create metadata for structured access to your data.

3. Train a Recommendation Model

  • Use Amazon Personalize:
    1. Create a Dataset Group: Upload your data to Amazon Personalize (user-interaction data).
    2. Choose a Recipe: Select the Personalized Ranking recipe for this use case.
    3. Train the Model: Use your data to create a machine learning model. Personalize automatically tunes the model using AutoML.

4. Deploy the Model

  • Deploy the trained model in Amazon Personalize as a real-time recommendation engine.
  • Generate a recommendation for each user by providing their User ID and optional context data .

5. Integrate with the Notification System

  1. Real-Time Inference:

    • Use the Amazon Personalize Runtime API to query recommendations (E.g: which channel to use).
    • Example Input: User ID + Context (E.g: time of day, location).
    • Example Output: Ranked list of channels (For instance, Email > SMS > Push).
  2. Lambda Function for Decision Making:

    • Trigger AWS Lambda from your notification gateway.
    • The Lambda function calls the Personalize API, selects the top-ranked channel, and routes the notification appropriately.

6. Monitoring and Feedback Loop

  • Use Amazon CloudWatch to track the performance of key metrics of the recommendation system.
  • Collect feedback: Track if users engage with the suggested channel and store this data back in Amazon S3.
  • Regularly retrain the model in Amazon Personalize with updated interaction data for continuous improvement.

I hope this document is helpful. Please let me know if you have any other questions in the comment section. You can use the link below to optionally view this presentation on YouTube:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .