A case study: Duplicate Entries in Event Processing with SQS + Lambda

2 min readMar 27, 2024

Issue Summary

Duplicate entries were discovered in a Redshift database, spanning over several months. The duplicates were primarily caused by two factors:

Multiple consumption of messages by the consumer lambda from the entries FIFO queue.
Duplicate requests made by the source teams.

Root Cause

The root cause of the duplicate entries can be attributed to the behavior of AWS Lambda when processing messages from an SQS queue, especially when using the AWS SDK. Here’s how the process typically works:

AWS Lambda receives a batch of messages from the SQS queue.
The Lambda function processes each message in the batch.
If the Lambda function completes successfully (no errors or exceptions), the AWS SDK automatically deletes the messages from the SQS queue.

However, if the Lambda function encounters an error or exception during processing, the messages will not be automatically deleted. Instead, they will be retried by SQS according to the queue’s configuration. In this case, the duplicates occurred because of partially failed batches. When a message failed to process (e.g., due to a Redshift exception), the Lambda function did not delete the successfully processed messages before the failed one. As a result, these messages were reprocessed by subsequent invocations of the Lambda function, leading to duplicate entries.

For example, if the 6th message processing failed in a batch of 10, the current Lambda invocation exited without deleting the 5 messages processed successfully before the failed 6th one. These 5 messages were then visible in the queue for another Lambda invocation after the visibility timeout (10 minutes in this case).

Conclusion

To prevent duplicate entries in the future, it’s important to handle message processing errors more robustly. One approach could be to use a dead-letter queue (DLQ) in SQS to capture messages that cannot be processed after a certain number of retries. Additionally, ensuring that messages are deleted only after successful processing, even in the case of partial failures, can help mitigate the risk of duplicate entries.

A case study: Duplicate Entries in Event Processing with SQS + Lambda

Issue Summary

Root Cause

Conclusion

Written by Akhil Ghatiki