How To Define and Run a Job in AWS Glue

Here we show how to run a simple job in Amazon Glue.

The basic procedure, which we’ll walk you through, is to:

Create Python script

First we create a simple Python script:

arr=

for i in range(len(arr)):
    print(arr)

Copy to S3

Then use the Amazon CLI to create an S3 bucket and copy the script to that folder.

aws s3 mb s3://movieswalker/jobs
aws s3 cp counter.py s3://movieswalker/jobs

Configure and run job in AWS Glue

Log into the Amazon Glue console. Go to the Jobs tab and add a job. Give it a name and then pick an Amazon Glue role. The role AWSGlueServiceRole-S3IAMRole should already be there. If it is not, add it in IAM and attach it to the user ID you have logged in with. See instructions at the end of this article with regards to the role.

Configure and run job in AWS GlueThe script editor in Amazon Glue lets you change the Python code.

script editorThis screen shows that you can pass run-time parameters to the job:

Run-time ParametersRun the job. When you run it, if there is any error you are directed to CloudWatch where you can see that. The error below is an S3 permissions error:

S3 Permissions ErrorHere is the job run history.

Job Run HistoryHere is the log showing that the Python code ran successfully. In this simple example it just printed out the numbers 1,2,3,4,5. Click the Logs link to see this log.

Logs

Give Glue user access to S3 bucket

If you have run any of our other tutorials, like running a crawler or joining tables, then you might already have the AWSGlueServiceRole-S3IAMRole. What's important for running a Glue job is that the role has access to the S3 bucket where the Python script is stored.

In this example, I added that manually using the JSON Editor in the IAM roles screen and pasted in this policy:

{
    "Version": "2012-10-17",
    "Statement": 
}

If you don't do this, or do it incorrectly, you will get this error:

File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

Here we show that the user has the AWSGlueserviceRole policy and the S3 policy we just added in the AWSGlueServiceRole-S3IAMRole role. That, of course, must be attached to your IAM userid.

AWSGlueServiceRole-S3IAMRole

Additional resources

Explore these resources: