AWS Step Functions are like a big and clunky assembly language

Mark McWiggins
3 min readMay 3, 2020

I was recently introduced to AWS Step Functions, which implement a state machine that orchestrates different steps in multistage procedures, for example ETL pipelines.

First of all, any state machine is isomorphic to any full-featured programming language. I’ve been taking “we’re implementing a state machine” as shorthand for “we don’t know what we’re doing” for 30 years.

Secondly, the syntax AWS designed is (a) based on JSON and (b) has clunky control structures that remind me of assembly language:

"ChoiceStateX": {
"Type": "Choice",
"Choices": [
{
"Not": {
"Variable": "$.type",
"StringEquals": "Private"
},
"Next": "Public"
},
{
"Variable": "$.value",
"NumericEquals": 0,
"Next": "ValueIsZero"
},
{
"And": [
{
"Variable": "$.value",
"NumericGreaterThanEquals": 20
},
{
"Variable": "$.value",
"NumericLessThan": 30
}
],
"Next": "ValueInTwenties"
}
],
"Default": "DefaultState"
},

"Public": {
"Type" : "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:Foo",
"Next": "NextState"
},

"ValueIsZero": {
"Type" : "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:Zero",
"Next": "NextState"
},

"ValueInTwenties": {
"Type" : "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:Bar",
"Next": "NextState"
},

"DefaultState": {
"Type": "Fail",
"Cause": "No Matches!"
}

BLEAH!

This seems preposterous to me. What they should have done: (1) write the orchestration engine as a background process (2) expose a REST API to it or … at least expose APIs for several popular languages (Python, Java, Ruby, JavaScript, etc.)

But say you are working on a project where AWS Step Functions are required; how to minimize the pain?

I am working on such a project; I was asked by one of the people I’m working with “does the addition of another stage of our project require another step function?” I thought so until I had time to think about it …

The pipeline for this project is a series of SQL scripts that run in a certain sequence against an Amazon Redshift cluster. By default they would each require invocation of an AWS Glue node (not AWS Lambda since some can run far beyond the 15 minute limit for Lambda), running in “Python mode” (not Spark mode). But what if we move the state storage from Step Functions to Redshift?

CREATE TABLE sqlsteps (id SERIAL PRIMARY KEY, sqlscript TEXT, TIMESTAMP runstart);

insert into sqlsteps(1, ‘step1.sql’, NULL);

insert into sqlsteps(5, ‘step2.sql’, NULL);

insert into sqlsteps(10, ‘step3.sql’, NULL);

insert into sqlsteps(15, ‘step4.sql’, NULL);

insert into sqlsteps(20, ‘END’, NULL);

Then you start a job with the trivial step function (syntax is left as an exercise for the masochistic reader):

Task gluestep:

arn:glue:gluestep 0

choice $.result is ‘END’: success

default: gluestep

In the Python gluestep:

if we are starting, take the timestamp, run the first step, saving the timestamp back to that record

if not starting, figure out the last step run (order by steprun and IS NOT NULL) and run the next, until END

That’s it!

To add a new step, just add a record in the database in the “right place” (which is why you leave a space between IDs …)

This will only support 1 run at a time; to parallelize it the easiest way seems to be just to clone the whole operation … you could also just allocate different ID spaces to different users …

I understand that not everybody is as horrified by the syntax as I am, but I think the method outlined above can also provide a significant productivity and flexibility boost to anyone charged with maintaining AWS infrastructure.

Comments are welcome!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response