Asynchronous callbacks with AWS Step Functions and Lambda

AWS Step Functions recently added support for callback patterns for long-running tasks.

This addresses one of the weaknesses of AWS Step Functions. Previously, if you wanted to have a workflow step invoke a long-running process on an EC2 instance via SSM, the main workaround is to have a polling cycle in the workflow.

  • First lambda task invokes SSM (or other long-running process)
  • Second lambda task checks if SSM command has completed
  • Loop back to polling task if still in progress

There are two problems with this:

  • Clutter in the workflow - harder to understand what’s actually happening
  • Extra charges for lambda executions and state transitions that aren’t doing anything useful

The new callback workflow is a significant improvement. Instead the workflow would be something like

  • First lambda invokes SSM, passing it a task token
  • Workflow pauses until SendTaskSuccess API called with task token
  • When SSM command completes, call SendTaskSuccess to resume workflow

Here is an example of what a step machine definition might look like:

  "StartAt": "AsyncLambda",
  "States": {
    "AsyncLambda": {
      "Type": "Task",
      "Next": "AsyncDone"
    "AsyncDone": {
      "Type": "Pass",
      "Result": "Async func completed!",
      "End": true

and the corresponding Lambda

def lambda_handler(event, context):
    token = event['token']
    ssm = boto3.client('ssm')
            Parameters={'commands': [f' --token {token}']}

The shell script on the EC2 instance might look like

token=# parse from args


# signal success on completion
aws stepfunctions send-task-success --task-token $token --task-output '{"status": "success"}'

Written on June 20, 2019