Contact Us

AWS Step functions - A Practical Guide


By abdulmumin yaqeen

on February 21, 2024



img

AWS Step functions - A Practical Guide

In this tutorial, we'll explain what AWS Step Functions are, show you how they work, and guide you through some examples to help you understand and use them effectively. Let's get started on making your workflow easier with AWS Step Functions!

What is AWS Step function

Services (AWS). It allows you to coordinate and automate the execution of multiple AWS services in response to various events or triggers. With Step Functions, you can build workflows, known as state machines, that define the sequence of steps or tasks to be executed, along with the conditions for transitioning between these steps.

Step Functions simplifies the process of managing complex workflows, making it easier to implement and maintain business logic, data processing pipelines, and other workflow-driven applications. It provides features such as error handling, retries, parallel execution, and task branching, enabling you to create robust and scalable applications without managing the underlying infrastructure.

Overall, AWS Step Functions is a powerful tool for building scalable, reliable, and event-driven applications, allowing you to focus on your business logic rather than the underlying infrastructure.

What we will be building

We will build a simple document processing workflow, where we have a single endpoint of upload a file, and we will be sending the file to different places for processing depending on the file type.

Here is an illustration showing what we will be building:

step function illustration

Prerequisite;

Before diving in, you need to set up your AWS environment. This involves:

We will be using Python as the primary language for this demo.

Now, let populate our lambda functions with the code to process our image and document. Without making it too complicated, we will be resizing the image, and for the document, we will just store it directly to the s3 button.

Initial lambda function;

So this will be the function in the workflow, that will detect the file type and pass the result into the rest of the workflow for processing.

import json def lambda_handler(event, context): # Get the uploaded file's information from the event bucket = event\['Records'\][0]\['s3'\]['bucket']['name'] key = event\['Records'\][0]\['s3'\]['object']['key'] # Extract the file extension from the key file_extension = key.split('.')[-1].lower() # List of image file extensions image_extensions = ['jpg', 'jpeg', 'png', 'gif', 'bmp'] # List of document file extensions document_extensions = ['doc', 'docx', 'pdf', 'txt', 'ppt', 'pptx', 'xls', 'xlsx'] if file_extension in image_extensions: file_type = 'image' elif file_extension in document_extensions: file_type = 'document' else: file_type = 'unknown' response = { 'statusCode': 200, 'type': file_type, 'fileInfo':{ 'bucket':bucket, 'key':key } } return response

Code for **process-image-demo** lambda function;

You can populate the lambda function with the following code:

import boto3 from PIL import Image from io import BytesIO def process_image(input_bucket, input_key, output_bucket, output_key, new_size=(300, 300)): """ Processes an image stored in S3 by resizing it and converting it to grayscale. Parameters: input_bucket (str): The name of the input S3 bucket. input_key (str): The key of the input object in the input S3 bucket. output_bucket (str): The name of the output S3 bucket. output_key (str): The key of the output object in the output S3 bucket. new_size (tuple): A tuple representing the new size of the image (width, height). """ # Initialize S3 client s3 = boto3.client('s3') # Download the input image from S3 input_image_obj = s3.get_object(Bucket=input_bucket, Key=input_key) input_image_bytes = input_image_obj['Body'].read() input_image = Image.open(BytesIO(input_image_bytes)) # Process the image processed_image = input_image.resize(new_size).convert('L') # Upload the processed image to S3 output_image_buffer = BytesIO() processed_image.save(output_image_buffer, format='JPEG') output_image_buffer.seek(0) s3.put_object(Bucket=output_bucket, Key=output_key, Body=output_image_buffer) def lambda_handler(event, context): """ Lambda function handler. Parameters: event (dict): The event data passed to the Lambda function. context (object): The runtime information of the Lambda function. Returns: dict: The response indicating the completion of the function. """ # Output S3 bucket where the processed image will be stored output_bucket = 'process-image-demo-bck' # Extract input S3 bucket and object key from the event input_s3_bucket = event\['fileInfo'\]['bucket'] input_s3_object_key = event\['fileInfo'\]['key'] # Use the same object key for output output_s3_object_key = input_s3_object_key # Process the image process_image(input_s3_bucket, input_s3_object_key, output_bucket, output_s3_object_key) # Return response response = {"finished": true, "filename":output_s3_object_key} return response

Code for **process-document-demo** lambda function;

You can populate the lambda function with the following code:

import json def lambda_handler(event, context): # TODO implement output_bucket = 'process-document-demo-bck' input_s3_bucket = event\['fileInfo'\]['bucket'] input_s3_object_key = event\['fileInfo'\]['key'] output_key = input_s3_object_key s3 = boto3.client('s3') s3.put_object(Bucket=output_bucket, Key=output_key, Body=output_image_buffer) response = {"finished": true, "filename": output_key} return response

Know that we have both functions ready, There are a few things we should clarify before moving forward.

Setting up our step function

Now that all of our functions are ready, let chain them together in a step function.

First, navigate to step functions and create a new state machine, you get a dialog that looks like this:

aws step function cloudplexo

We will be using a blank template, click select to continue.

Second, Drag a lambda function into the workflow, and also provide the function name. In this case, we’re pointing it to our check-image-or-document lambda function.

aws step function cloudplexo

Next up, we will add a choice flow, which will determine the next function to trigger depending on the output from our check-image-or-document.

aws step function cloudplexo

We can configure, our rules to check the value returned from the previous lambda function which is check-image-or-document. Since our function will be returning a string with either - image, document or unknown. We can easily check for that In the rule.

For Image (rule #1)

For Documents (rule #2)

aws step function cloudplexo

Now, you can add the respective functions to the step and your workflow should look like this:

aws step function cloudplexo

Make sure to connect to their respective lambda functions.

The code for this looks like this:

{ "Comment": "A description of my state machine", "StartAt": "File upload", "States": { "File upload": { "Type": "Task", "Resource": "arn:aws:states:::lambda:invoke", "OutputPath": "$.Payload", "Parameters": { "Payload.$": "$", "FunctionName": "arn:aws:lambda:us-east-1:276023487603:function:check-image-or-document:$LATEST" }, "Retry": [ { "ErrorEquals": [ "Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException", "Lambda.TooManyRequestsException" ], "IntervalSeconds": 1, "MaxAttempts": 3, "BackoffRate": 2 } ], "Next": "Choice" }, "Choice": { "Type": "Choice", "Choices": [ { "Variable": "$.type", "StringEquals": "image", "Next": "Process image" }, { "Variable": "$.type", "StringEquals": "document", "Next": "Process document" } ] }, "Process image": { "Type": "Task", "Resource": "arn:aws:states:::lambda:invoke", "OutputPath": "$.Payload", "Parameters": { "Payload.$": "$", "FunctionName": "arn:aws:lambda:us-east-1:276023487603:function:process-image-demo:$LATEST" }, "Retry": [ { "ErrorEquals": [ "Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException", "Lambda.TooManyRequestsException" ], "IntervalSeconds": 1, "MaxAttempts": 3, "BackoffRate": 2 } ], "End": true }, "Process document": { "Type": "Task", "Resource": "arn:aws:states:::lambda:invoke", "OutputPath": "$.Payload", "Parameters": { "Payload.$": "$", "FunctionName": "arn:aws:lambda:us-east-1:276023487603:function:process-document-demo:$LATEST" }, "Retry": [ { "ErrorEquals": [ "Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException", "Lambda.TooManyRequestsException" ], "IntervalSeconds": 1, "MaxAttempts": 3, "BackoffRate": 2 } ], "End": true } } }

Finally, you can hit the create button, to complete the state machine setup, and a dialog like this will show up:

aws step function cloudplexo

Congratulations on completing our AWS Step Functions tutorial! You've now gained a solid understanding of how to use this powerful orchestration service to streamline and automate your workflows in the cloud. By mastering AWS Step Functions, you have unlocked the potential to build scalable, reliable, and efficient applications that respond dynamically to changing events and requirements.

You can test your workflow, with some test data, and you workflow output should look like this:

As you continue your journey with AWS, remember to explore the various features and integrations that Step Functions offers. Experiment with different workflow patterns, incorporate error handling and retries, and leverage the flexibility of state machines to create tailored solutions for your specific use cases.

We hope this tutorial has equipped you with the knowledge and confidence to leverage AWS Step Functions effectively in your projects. Remember, practice makes perfect, so don't hesitate to dive deeper into the documentation, explore additional resources, and continue learning and experimenting with AWS services.

Thank you for joining us on this learning adventure, and best of luck with your future endeavors in cloud computing and serverless application development!

Happy orchestrating!

Continue Reading:

Top Cloud Services providers in Nigeria with CloudPlexo's Innovative Solutions

How to Deploy and Host Your Web App on AWS Amplify

Uploading Large Files Upto 5TB to Amazon S3 using Boto3 in Python