AWS Lambda Throttling Mechanism Investigation

Throttling is the process of limiting the number of requests that can be submitted to a given operation in a given amount of time.

AWS Lambda Function Scaling

  • The first time you invoke your function, AWS Lambda creates an instance of the function and runs its handler method to process the event. When the function returns a response, it sticks around to process additional events. If you invoke the function again while the first event is being processed, Lambda creates another instance. 
  • As more events come in, Lambda routes them to available instances and creates new instances as needed. This continues until there are enough instances to serve all requests, or a concurrency limit is reached.

Limit Scaling

To limit scaling and allocate capacity on a per-function basis, you can configure functions with reserved concurrency

To manage reserved concurrency settings for a function, use the Lambda console.

  • To reserve concurrency for a function:
    1.     Open the Lambda console Functions page.
    2.     Choose a function.
    3.     Under Concurrency, choose Reserve concurrency.
    4.     Enter the amount of concurrency to reserve for the function.
    5.     Choose Save.

To throttle a function, set the reserved concurrency to zero. This stops any events from being processed until you remove the limit. 

When requests come in faster than your function can scale, or when your function is at maximum concurrency, additional requests fail with a throttling error (429 status code). When you invoke your function directly, you should treat this as a retryable error.

Lambda Throttling with Queue

Instead of preventing the execution of the incoming requests when a throttling limit touched. Every execution can be placed into a queue, which will be drained at the desired limit.

To this purpose we can use the SQS service provides by AWS.

Consuming the Queue

In this section, we’re describing the way AWS Lmabda will consume the SQS in.

AWS Lambda Event Source Mapping

Lambda event source mapping is an AWS Lambda resource that reads from an event source and invokes a Lambda function.

You can use event source mappings to process items from a stream or queue in services that don’t invoke Lambda functions directly.

Lambda provides event source mappings for the following services. Services That Lambda Reads Events From:

  • Amazon Kinesis
  • Amazon DynamoDB
  • Amazon Simple Queue Service (SQS)

Lambda reads messages in batches and invokes your function once for each batch. 

When your function successfully processes a batch, Lambda deletes its messages from the queue.

You can configure the size of the batch that the event source mapping reads, up to a maximum that varies by service (maximum 10 for the queues).

The number of items in the event can be smaller than the batch size if there aren’t enough items available, or if the batch is too large to send in one event and has to be split up.

Accumulating the messages in the queue:

When creating an event source mapping we have the following parameter

MaximumBatchingWindowInSeconds: The maximum amount of time to gather records before invoking the function, in seconds.

Valid Range: Minimum value of 0. Maximum value of 300.

So we can’t wait more than about 5 minutes for the messages to be accumulated in the queue.

Using AWS Lambda with Amazon SQS

Lambda polls the queue and invokes your function synchronously with an event that contains queue messages. When you invoke a function synchronously, Lambda runs the function and waits for a response. When the function execution ends, Lambda returns the response from the function’s code.

When messages are available, Lambda reads up to 5 batches and sends them to your function. If messages are still available, Lambda increases the number of processes that are reading batches by up to 60 more instances per minute.

The maximum number of batches that can be processed simultaneously by an event source mapping is 1000. 

When Lambda reads a batch, the messages stay in the queue but become hidden for the length of the queue’s visibility timeout.

If your function successfully processes the batch, Lambda deletes the messages from the queue. If your function is throttled, returns an error, or doesn’t respond, the message becomes visible again.


AWS Lambda only offers concurrency management, but no instance management. So, to prevent the scaling totally, we have to set the reserved concurrency to one, to get two things:

  • only one lambda instance.
  • only one request in one go. (new requests will get 429-error).

To accumulate the incoming requests and not preventing their execution when your function is throtteled, you can use AWS SQS.

Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from google play:

Further Reading

Leave a Reply

Your email address will not be published. Required fields are marked *