Machine Learning & Big Data Blog

How to Copy JSON Data to an Amazon Redshift Table

4 minute read
Walker Rowe

Here we show how to load JSON data into Amazon Redshift. In this example, Redshift parses the JSON data into individual columns. (It is possible to store JSON in char or varchar columns, but that’s another topic.)

First, review this introduction on how to stage the JSON data in S3 and instructions on how to get the Amazon IAM role that you need to copy the JSON file to a Redshift table.

In this example, we load 20 years of temperature data for Paphos, Cyprus. We purchased that data for $10 from OpenWeather. Of course, you could use any data.

Create a Redshift Table

First we create a table. We only want the date and these three temperature columns. We will give Redshift a JSONParse parsing configuration file, telling it where to find these elements so it will discard the others.

create table paphos (
dt_iso timestamp not null distkey sortkey,
temp real,
temp_min  real,
temp_max real
);

The weather data looks like this:

{
"city_name": "Paphos Castle",
"lat": 34.753637,
"lon": 32.406951,
"main": {
"temp": 55.35,
"temp_min": 51.8,
"temp_max": 65.53,
"feels_like": 49.44,
"pressure": 1016,
"humidity": 73
},
"wind": {
"speed": 9.17,
"deg": 20
},
"clouds": {
"all": 1
},
"weather": [{
"id": 800,
"main": "Clear",
"description": "sky is clear",
"icon": "01n"
}],
"dt": 946684800,
"timezone": 7200
}

Here is one record in the JSON Power Editor for Mac.

Note: I recommend this editor if you work with JSON a lot, as it makes editing JSON files a lot easier. You can work with objects in the right-hand screen which will create the text in the left-hand screen. That saves you the trouble of having to fix syntax error and line up curly brackets.

Create JSONPath file

We create a JSONPath file, which tells Redshift which elements to get. We have to give it the path of the item all the way down to the item. In other words, we can’t put just the top-level key weather and it will get temp, temp_min, and temp_max. We have to give it the full path of JSON keys main->temp.

We don’t have any arrays in this example, but it supports that using [array index] notation.

{
"jsonpaths": [
"$['dt_iso']",
"$['main']['temp']",
"$['main']['temp_min']",
"$['main']['temp_max']"
]
}

Prepare and upload JSON file to S3

This text file is 64 MB of daily weather records for the past 20 years. Unfortunately, it’s made all as one JSON array. So, we have to remove the first bracket and last bracket characters from the file.

Also, Redshift seems to require for the JSONP format that each record have a line feed at the end. Then means we need to insert a line feed for each. So, use these three sed statements to do that.

Note: JSONP file format means having one record right after another. So, taken together it’s not a valid JSON object. Instead it’s just a way to put many JSON objects into a file for bulk loading.

sed 's/^.//'
sed 's/.$//'
sed s'/,{"city/\n{"city/g'

Copy this file and the JSONPaths file to S3 using:

aws s3 cp (file)  s3://(bucket)

Load the data into Redshift

We use this command to load the data into Redshift. paphosWeather.json is the data we uploaded. paphosWeatherJsonPaths.json is the JSONPath file.

copy paphos
from 's3://gluebmcwalkerrowe/paphosWeather.json'
credentials 'aws_iam_role=arn:aws:iam::xxxxx:role/Redshift'
region 'eu-west-3'
json 's3://gluebmcwalkerrowe/paphosWeatherJsonPaths.json';

Common errors

If you have formatted the text or JSONPaths table wrong or illogically you will get any of these errors.

The first says to look into select * from stl_load_errors for more details:

ERROR: Load into table 'paphos' failed. Check 'stl_load_errors' system table for details.

This error says we have five elements in the JSONPath file but have created a database table with 13 columns. Go back and fix those.

ERROR: Number of jsonpaths and the number of columns should match. JSONPath size: 5, Number of columns in table or column list: 13 Detail: ----------------------------------------------- error: Number of jsonpaths and the number of columns should match. JSONPath size: 5, Number of columns in table or column list: 13 code: 8001 context: query: 273 location: s3_utility.cpp:780 process: padbmaster [pid=20575] -----------------------------------------------

If you put all your JSON data into an array instead of the JSONP format it will be too large. Then you might get:

String length exceeds DDL length

Check the loaded data

Here we look at the first 10 records:

select * from paphos limit 10;

Here we count them. As you can see there are 181,456 weather records.

select count(*) from paphos;

Additional resources

For more on this topic, explore these resources:

Learn ML with our free downloadable guide

This e-book teaches machine learning in the simplest way possible. This book is for managers, programmers, directors – and anyone else who wants to learn machine learning. We start with very basic stats and algebra and build upon that.


These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

See an error or have a suggestion? Please let us know by emailing blogs@bmc.com.

Business, Faster than Humanly Possible

BMC empowers 86% of the Forbes Global 50 to accelerate business value faster than humanly possible. Our industry-leading portfolio unlocks human and machine potential to drive business growth, innovation, and sustainable success. BMC does this in a simple and optimized way by connecting people, systems, and data that power the world’s largest organizations so they can seize a competitive advantage.
Learn more about BMC ›

About the author

Walker Rowe

Walker Rowe is an American freelancer tech writer and programmer living in Cyprus. He writes tutorials on analytics and big data and specializes in documenting SDKs and APIs. He is the founder of the Hypatia Academy Cyprus, an online school to teach secondary school children programming. You can find Walker here and here.