The business context
At impress.ai, we use AWS’s SES (Simple Email Service) to send out all our transaction emails from our platform. We make use of the boto3 API for this which we’ve got working rather nicely.
One of the systems AWS has in place to keep bounces and complaints in check is that they monitor the reputation of our mailbox by seeing how many bounces and complaints are there. If there are too many bounces or complaints, they initially put the account under review. Subsequently, if fixes aren’t in place, then they pause sending. You can read more details here: https://docs.aws.amazon.com/ses/latest/DeveloperGuide/faqs-enforcement.html.
We had actually had our account put under review very early on in my journey at Impress when it was just me building things. Back then I’d read https://aws.amazon.com/blogs/messaging-and-targeting/handling-bounces-and-complaints/ and set up a system to post a CSV file to our team slack channel with the emails that are bouncing/complaining so that we could act on them.
However, the above setup broke a few months back when we switched to enabling server-side encryption on SQS/SNS. With other priorities, this got deprioritised to the extent that last week we got notified that our account was under review. This triggered the need to fix our whole monitoring system. In reviewing things online, as seems to be rather normal with AWS, we found a lot of “getting started” instructions and study materials but no single resource that told us “do this!”. So we decided that we should share what we set up in case the next poor soul trying to keep things together in a young startup runs into some trouble.
AWS services and other tools we’ll be using in this post:
We have a couple of things that we will be discussing in this post, so let me define them for those who don’t know.
-
- AWS Services:
- Simple Email Service (https://aws.amazon.com/ses/) – it’s a service that lets you send emails from your app
- Simple Queueing Service (https://aws.amazon.com/sqs/) – It’s a message queue system that sort of acts as a message broker between services. Keep hold of things until they can be handed over to someone else.
- Simple Notification Service (https://aws.amazon.com/sns/) – It’s a service to send notifications out based on triggers. It’s a sort of glue that links different services together. I wonder if Amazon has done some kind of study on whether putting “simple” in the name helps or gets people annoyed…
- Key Management Service (https://aws.amazon.com/kms/) – Create and manage cryptographic keys that help you ensure that your data is encrypted at rest on your server at all times. I’m not entirely sure how this could be exploited, but it seems like a no brainer today to encrypt in rest and transit whatever we can.
- AWS Lambda (https://aws.amazon.com/lambda/) – These are basically “serverless” compute instances. Think of it as AWS letting you run individual functions in the language that you choose and charging you only for the time it runs and the memory it uses.
- Other tools that we are using:
- Python 3.7: We are using python in our lambda functions but I don’t see any reason.
- Slack: We use slack for our internal communication, so we use Slack and it’s incoming webhooks to post messages as needed on our channel.
First steps in setting up monitoring on the AWS SES dashboard
We set up three pieces in our monitoring system. The bulk of the post will cover the most technically challenging and useful one, but I would recommend that you definitely have the other two sets up as well as they can prove to be pretty useful as well.
- Step 1: Enable email feedback forwarding. You can enable email feedback. Over multiple pages, AWS explains how you can do this by clicking on the “view details” on your domain/email address and under notification choose the email feedback forwarding option. Note that AWS sends the bounce/complaint notification to your from address or your reply-to address. This can be a complication. So make sure you can receive emails on this or enable step 2 and 3. More details here: https://docs.aws.amazon.com/ses/latest/DeveloperGuide/monitor-sending-activity-using-notifications-email.html
- Step 2: Enable getting stats: You can always check your reputation and your current spam/complaint/bounce/reject count in your SES dashboard. But let’s face it it’s a pain to log in to and you won’t’ end up monitoring this on a daily basis. So we set up something using this to get daily notifications on our reputation over the last 15 days and the timestamps at which issues occurred. I’ve detailed how to send this in the section: Get regular reputation stats on slack.
- Step 3: Enable notification on each bounce/complaint: For us what makes the most sense since bounces and complaints are few and far between is for us to have a system that notifies us of each bounce/complaint. I outline how we do this in the section: Getting bounce/complaint notifications on slack
Getting reputation stats on Slack:
Create a lambda function with the following code:
https://github.com/impressai/SESMonitoringTools/blob/master/basic-stats-lambda.py and use an event bridge to call it daily (or at whatever period you need)
The bulk of the code is just to prettify the message that we get when we use the code that is explained in this. The lambda handler is the function that is called when the lambda is called. This calls and gets the send statistics from boto3’s get_send_statistics API endpoint. The rest of the code then breaks this down into a nicer format, timezone and sorting and then posts it into a slack incoming webhook. A webhook gives you a unique URL to which you can make an HTTP POST request and Slack will post it to the configured channel in your Company’s slack.
Enabling notifications for bounces through Amazon SNS
This is the rather more difficult task and involves multiple pieces that required me to piece together information from different parts of the Internet to get done. So here’s a step by step process for those who are interested:
- Create a KMS key. Go to key management service in AWS and create a new Key. You’ll have to follow a few steps in selecting who has access to this. Since this is not being used by anyone other than the services, I gave the minimum access necessary to remove and manage it. Beyond that, you have to add the following to the KMS key access policy for SES and SNS to use it in the below steps:
{ “Effect”: “Allow”,
“Principal”: {
“Service”: “ses.amazonaws.com”
},
“Action”: [
“kms:GenerateDataKey*”,
“kms:Decrypt”
],
“Resource”: “*”
},
{
“Effect”: “Allow”,
“Principal”: {
“Service”: “sns.amazonaws.com”
},
“Action”: [
“kms:GenerateDataKey*”,
“kms:Decrypt”
],
“Resource”: “*”
} |
- Next, go to AWS SNS and create two notification services, once for bounce and one for complaints. At this point, enable encryption and use the key you created above. The rest, use as you see fit. The default access policy is perfectly fine.
- Next, go to AWS SQS and create a new queue. Again the defaults worked fine for me, I just enabled encryption using the above key.
- Now go to your SES dashboard and choose your queues for bounces and complaints.
- Now, if you send emails using AWS SES test email feature, to complaint@simulator.amazonses.com and bounce@simulator.amazonses.com you should be able to see the queue build-up
- As a final step create an AWS lambda function, create a new role that has permission to read from SQS and also to decrypt using the kms key. Policy attached below for reference. You can get the first two automatically by choosing the SQS polling template policy as a basis when creating your lambda role
{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“sqs:DeleteMessage”,
“sqs:GetQueueAttributes”,
“sqs:ReceiveMessage”
],
“Resource”: “arn:aws:sqs:*”
}
]
} |
{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: “logs:CreateLogGroup”,
“Resource”: “arn:aws:logs:us-east-1:776002636787:*”
},
{
“Effect”: “Allow”,
“Action”: [
“logs:CreateLogStream”,
“logs:PutLogEvents”
],
“Resource”: [
“arn:aws:logs:us-east-1:776002636787:log-group:/aws/lambda/sesSlackBounceComplaintNotification:*”
]
}
]
} |
{
“Version”: “2012-10-17”,
“Statement”: [
{
“Action”: [
“kms:Decrypt”
],
“Resource”: “arn:aws:kms:us-east-1:776002636787:key/39b9493b-40d4-454a-a179-e2a870bf1cb9”,
“Effect”: “Allow”
}
]
} |
- Use the following python code in your lambda function: https://github.com/impressai/SESMonitoringTools/blob/master/notification-manager.py and configure two triggers. One from each of the SQS. The code is mostly self-explanatory. The Lmabda_handler is called as soon as an element enters the queue in SQS. The handler then reads through all the messages in the queue and processes them. What the code does is reads through each message, sees what kind of message it is, converts it to a decent text format and then forwards it to an incoming webhook on slack.
- Please note that you have to set the SLACK_WEBHOOK environment variable to be your incoming webhook in the above code. Also for good housekeeping perhaps create a tag for all the resources you create for this project.
If the set up is all correct, you should already receive a couple of notifications on slack because of the bounces and complaints in step 5. Alternatively, send a few more tests to make sure things are working.
Side notes and gotchas:
- If you want to see the full notification, add a print to see the raw notification when receiving from the queue itself. You can then modify the slack message format to show the information you consider to be important. For example, there is some cleanup still to be done in the “mail” part of the queue notification.
- Set up cloudwatch monitoring and budget alarms to throttle things in case things go wrong. For example, if the code is misconfigured and crashing, then SQS keeps sending the message to Lambda and lambda keeps restarting over and over again and it will probably be costly in the long term if this happens and you don’t notice.