Main author: Firdaws Aboulaye
The File API is essential for managing file uploads and downloads in our architecture. We designed a workflow that includes registration, secure uploads, and file state updates to handle file storage efficiently.
Workflow Overview
- Registration: Store file information (metadata, state, user or session information) in the system before the upload.
- Presigned URL Generation: Provide a short-lived presigned URL for secure, direct uploads to Amazon S3.
- File State Update: After the upload, update the file's state and verify its integrity.
Challenges and Solutions
-
Initial Approach: Using S3 Events
- Method: Each file upload to S3 triggered an S3 Event, invoking a Lambda function per file to update the database and check for corruption.
-
Issues:
- High Concurrency: Massive uploads led to exceeding AWS Lambda's concurrency limits.
- Increased Errors: Overloaded Lambdas resulted in failed executions.
- Lack of Retries: S3 Events didn't support easy reprocessing of failed events.
-
Improved Approach: Leveraging SQS with Batching
- Method: Replaced S3 Events with messages sent to an SQS queue upon file upload. Configured Lambda functions to process batches of events from the queue.
-
Benefits:
- Reduced Executions: Batch processing minimized the number of Lambda invocations.
- Enhanced Error Handling: SQS allowed retries for failed messages with partial batch response
- Scalability: SQS standard queue has nearly unlimited throughput.
This architecture ensures scalability, reliability, and efficient handling of large volumes of uploads.
Events:
SQSEvent:
Type: SQS
Properties:
Enabled: true
Queue: !GetAtt PostUploadHandlingSQSQueue.Arn
FunctionResponseTypes:
- ReportBatchItemFailures
BatchSize: 9
This configuration ensures optimized SQS message processing and robust scalability, even during high traffic periods.
The BatchSize
property in the Lambda definition, as shown below, allows customizing the maximum number of items retrieved per batch (e.g., BatchSize: 9
)
Conclusion
The evolution of our File API showcases the importance of adapting architecture to meet real-world demands. By moving from direct S3 Event triggers to an SQS-based batch processing system, we overcame concurrency limits, reduced errors, and improved scalability.
In the next article, we’ll explore another domain API, diving into its unique challenges and the solutions we implemented to address them. Stay tuned for more insights into our journey of building scalable and reliable APIs!