AWS Step functions - A Practical Guide
In this tutorial, we'll explain what AWS Step Functions are, show you how they work, and guide you through some examples to help you understand and use them effectively. Let's get started on making your workflow easier with AWS Step Functions!
What is AWS Step function
It allows you to coordinate and automate the execution of multiple AWS services in response to various events or triggers. With Step Functions, you can build workflows, known as state machines, that define the sequence of steps or tasks to be executed, along with the conditions for transitioning between these steps.
Step Functions simplifies the process of managing complex workflows, making it easier to implement and maintain business logic, data processing pipelines, and other workflow-driven applications. It provides features such as error handling, retries, parallel execution, and task branching, enabling you to create robust and scalable applications without managing the underlying infrastructure.
Overall, AWS Step Functions is a powerful tool for building scalable, reliable, and event-driven applications, allowing you to focus on your business logic rather than the underlying infrastructure.
What we will be building
We will build a simple document processing workflow, where we have a single endpoint of upload a file, and we will be sending the file to different places for processing depending on the file type.
Here is an illustration showing what we will be building:
Prerequisite;
Before diving in, you need to set up your AWS environment. This involves:
- Creating an AWS Account and IAM Role: If you don't already have an AWS account, sign up for one. Next,
- Create Three 3 lambda functions: We will need three lambda function, one for detecting the file type, another for processing the image, and the third one for processing document files. We can name these functions
check-image-or-document
,process-image-demo
, andprocess-document-demo
respectively. - Create two S3 bucket where we will store the processed files, you can name them
process-image-demo-bck
andprocess-document-demo-bck.
We will be using Python as the primary language for this demo.
Now, let populate our lambda functions with the code to process our image and document. Without making it too complicated, we will be resizing the image, and for the document, we will just store it directly to the s3 button.
Initial lambda function;
So this will be the function in the workflow, that will detect the file type and pass the result into the rest of the workflow for processing.
import json
def lambda_handler(event, context):
# Get the uploaded file's information from the event
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
# Extract the file extension from the key
file_extension = key.split('.')[-1].lower()
# List of image file extensions
image_extensions = ['jpg', 'jpeg', 'png', 'gif', 'bmp']
# List of document file extensions
document_extensions = ['doc', 'docx', 'pdf', 'txt', 'ppt', 'pptx', 'xls', 'xlsx']
if file_extension in image_extensions:
file_type = 'image'
elif file_extension in document_extensions:
file_type = 'document'
else:
file_type = 'unknown'
response = {
'statusCode': 200,
'type': file_type,
'fileInfo':{
'bucket':bucket,
'key':key
}
}
return response
Code for **process-image-demo**
lambda function;
You can populate the lambda function with the following code:
import boto3
from PIL import Image
from io import BytesIO
def process_image(input_bucket, input_key, output_bucket, output_key, new_size=(300, 300)):
"""
Processes an image stored in S3 by resizing it and converting it to grayscale.
Parameters:
input_bucket (str): The name of the input S3 bucket.
input_key (str): The key of the input object in the input S3 bucket.
output_bucket (str): The name of the output S3 bucket.
output_key (str): The key of the output object in the output S3 bucket.
new_size (tuple): A tuple representing the new size of the image (width, height).
"""
# Initialize S3 client
s3 = boto3.client('s3')
# Download the input image from S3
input_image_obj = s3.get_object(Bucket=input_bucket, Key=input_key)
input_image_bytes = input_image_obj['Body'].read()
input_image = Image.open(BytesIO(input_image_bytes))
# Process the image
processed_image = input_image.resize(new_size).convert('L')
# Upload the processed image to S3
output_image_buffer = BytesIO()
processed_image.save(output_image_buffer, format='JPEG')
output_image_buffer.seek(0)
s3.put_object(Bucket=output_bucket, Key=output_key, Body=output_image_buffer)
def lambda_handler(event, context):
"""
Lambda function handler.
Parameters:
event (dict): The event data passed to the Lambda function.
context (object): The runtime information of the Lambda function.
Returns:
dict: The response indicating the completion of the function.
"""
# Output S3 bucket where the processed image will be stored
output_bucket = 'process-image-demo-bck'
# Extract input S3 bucket and object key from the event
input_s3_bucket = event['fileInfo']['bucket']
input_s3_object_key = event['fileInfo']['key']
# Use the same object key for output
output_s3_object_key = input_s3_object_key
# Process the image
process_image(input_s3_bucket, input_s3_object_key, output_bucket, output_s3_object_key)
# Return response
response = {"finished": true, "filename":output_s3_object_key}
return response
Code for **process-image-demo**
lambda function;
You can populate the lambda function with the following code:
import json
def lambda_handler(event, context):
# TODO implement
output_bucket = 'process-document-demo-bck'
input_s3_bucket = event['fileInfo']['bucket']
input_s3_object_key = event['fileInfo']['key']
output_key = input_s3_object_key
s3 = boto3.client('s3')
s3.put_object(Bucket=output_bucket, Key=output_key, Body=output_image_buffer)
response = {"finished": true, "filename": output_key}
return response
Know that we have both functions ready, There are a few things we should clarify before moving forward.
- The
input-s3-bucket
andinput-s3-bucket-key
is getting it details fromevent[``'``fileinfo``'``]
which is the output of our initial lambda function.
Setting up our step function
Now that all of our functions are ready, let chain them together in a step function.
First, navigate to step functions and create a new state machine, you get a dialog that looks like this:
We will be using a blank template, click select to continue.
Second, Drag a lambda function into the workflow, and also provide the function name. In this case, we’re pointing it to our check-image-or-document
lambda function.
Next up, we will add a choice flow, which will determine the next function to trigger depending on the output from our check-image-or-document
function.
We can configure, our rules to check the value returned from the previous lambda function which is check-image-or-document
. Since our function will be returning a string with either - image
, document
, or unknown
. We can easily check for that In the rule.
For Image (rule #1)
For Documents (rule #2)
Now, you can add the respective functions to the step and your workflow should look like this:
Make sure to connect to their respective lambda functions.
The code for this looks like this:
{
"Comment": "A description of my state machine",
"StartAt": "File upload",
"States": {
"File upload": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"OutputPath": "$.Payload",
"Parameters": {
"Payload.$": "$",
"FunctionName": "arn:aws:lambda:us-east-1:276023487603:function:check-image-or-document:$LATEST"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2
}
],
"Next": "Choice"
},
"Choice": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.type",
"StringEquals": "image",
"Next": "Process image"
},
{
"Variable": "$.type",
"StringEquals": "document",
"Next": "Process document"
}
]
},
"Process image": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"OutputPath": "$.Payload",
"Parameters": {
"Payload.$": "$",
"FunctionName": "arn:aws:lambda:us-east-1:276023487603:function:process-image-demo:$LATEST"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2
}
],
"End": true
},
"Process document": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"OutputPath": "$.Payload",
"Parameters": {
"Payload.$": "$",
"FunctionName": "arn:aws:lambda:us-east-1:276023487603:function:process-document-demo:$LATEST"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2
}
],
"End": true
}
}
}
Finally, you can hit the create button, to complete the state machine setup, and a dialog like this will show up:
Congratulations on completing our AWS Step Functions tutorial! You've now gained a solid understanding of how to use this powerful orchestration service to streamline and automate your workflows in the cloud. By mastering AWS Step Functions, you have unlocked the potential to build scalable, reliable, and efficient applications that respond dynamically to changing events and requirements.
You can test your workflow, with some test data, and you workflow output should look like this:
As you continue your journey with AWS, remember to explore the various features and integrations that Step Functions offers. Experiment with different workflow patterns, incorporate error handling and retries, and leverage the flexibility of state machines to create tailored solutions for your specific use cases.
We hope this tutorial has equipped you with the knowledge and confidence to leverage AWS Step Functions effectively in your projects. Remember, practice makes perfect, so don't hesitate to dive deeper into the documentation, explore additional resources, and continue learning and experimenting with AWS services.
Thank you for joining us on this learning adventure, and best of luck with your future endeavors in cloud computing and serverless application development!
Happy orchestrating!
Continue Reading
Top Cloud Services providers with CloudPlexo's Innovative Solutions
How to Deploy and Host Your Web App on AWS Amplify
Uploading Large Files Upto 5TB to Amazon S3 using Boto3 in Python