One of Snowflake’s greatest strengths is how easily it can ingest data, regardless of format. In this article we’ll look at how we can easily send a large JSON file to a Snowflake stage, directly query the JSON data, and copy it into a table.
The instructions assume you have some familiarity with Snowflake already. We also make use of the SnowSQL CLI tool – you can find instructions on installing this here:
In a real situation we would take more care over the organisation and permissions of our database objects and user roles, but here we just want to show how easy it is to get started with semi-structured data.
Getting The Data
Bitcoin is over 10 years old and is often in the news, particularly when its price makes sudden large movements up or down. BitMEX is a popular Bitcoin derivatives exchange with comprehensive APIs. For this demonstration we’ll use some data about the Limit Order Book. Traders often look at the order book to help determine whether the price is likely to go up or down. With many traders using the exchange as well as algorithmic trading bots, the order book changes many times per second, generating a lot of data which is hard for humans to analyse. The Limit Order Book changes very frequently.
We used a simple NodeJS script and BitMEX’s own API reference connector to dump our JSON data to a file ready for loading. You can get the script and see a sample of the output data here: https://gist.github.com/taa-simon/02d2be4b7664ffce80b727f6586fd575
The JSON data is an array of objects (the orderbook snapshots). The objects have a timestamp property, and another property called ‘book’. The latter is an array of objects representing the total of the buy and sell orders at specific price levels. This nested structure could be fiddly to load into a two-dimensional table structure, but as we’ll see Snowflake makes it easy.
Creating Snowflake File Format and Stage objects
Loading data in Snowflake is done via stages and specified file formats. Make sure you have selected a database and schema to operate in, using USE statements.
Our file format tells Snowflake that the data is JSON and that it should remove the outer array of the data structure: