Learning Snowflake Python API

augusto kiniama rosa - Jun 17 - - Dev Community

Using Snowflake Python API to Manage Snowflake Objects

Writing Credits: Augusto Kiniama Rosa and Balkaran Brar

Most of the time, we use SQL or Terraform to manage Snowflake objects with transformation on either SQL or DBT, along with a few other alternatives.

Remember that this is a Public Preview feature, meaning it will still evolve. If you are taking this to Production, then be prepared to rewrite a few things as it grows as well.

What Is Snowflake Python API?

The Snowflake Python API is a library that enables Python to interact with Snowflake workloads. It offers a wide range of APIs for managing Snowflake resources, including data engineering, Snowpark, and application workloads, without having to use SQL commands. You can find a full list of support objects here.

With the Snowflake Python API, you can create, delete, or modify Snowflake resources, and perform tasks that would otherwise require Snowflake SQL commands using Python.

You can use it to manage various Snowflake objects. Currently, it allows you to manage the following objects:

  • Tasks, including all SQL statements, procedure, and logic in Snowflake Scripting
  • Databases, schemas, and tables
  • Virtual warehouses
  • Snowpark Container Services, including compute pools, image repositories, and services

Limitation:

  • The Snowflake Python API is not currently available for using Third-Party Packages from Anaconda.

Costs:

To minimize costs for both usage credit and network activity, the Snowflake Python API communicates with Snowflake only when you call methods intended to synchronize with Snowflake. As a result, when you’re working with in-memory references, such as when accessing attributes, your work is done locally and doesn’t incur any additional costs.

How Do I Use It?

Installation is super simple with a simple and I generally assuming you already installed Python 3.10.

pip install snowflake -U

-- In Case, you need the ML Components
pip install snowflake[ml] -U
Enter fullscreen mode Exit fullscreen mode

Connecting, there are two ways to create the Root object and make a connection: through a Snowpark Session object or a Snowflake Python Connector Connection object.

To connect, I choose to use the instructions here with a toml file that roughly looks like this.


[cli.logs]
save_logs = true
path = "/Users/augustorosa/Library/Application Support/snowflake/logs"
level = "info"

[dev]
account = "https:// ******.snowflakecomputing.com"
user = " ***** @infostrux.com"
password = " ******"
warehouse = " **** _WAREHOUSE"
database = "dw_db"
schema = "dw_schema"
Enter fullscreen mode Exit fullscreen mode

the connection:

with snowflake.connector.connect(
      connection_name="dev",
) as conn:

-- Overwrite some setting if you need
with snowflake.connector.connect(
      connection_name="myconnection",
      warehouse="DEV_WAREHOUSE",
      database="dev_marketing_db"
) as conn:
Enter fullscreen mode Exit fullscreen mode

Why the Root Object?

With a Root object, which we created above when we connected, you access objects and methods of the Snowflake Python API. The Root object is the root of the resource tree modeled by the Snowflake Python API. You will use the object to interact with Objects in Snowflake represented in the API.

Code in the following example uses the Root object to access Snowflake objects in order to resume the task named midnight_run. The task is in the schema named dw_schema, which is in the database named dw_db.

Let’s look at some code that represents the creation of the midnight_run task:

from snowflake.core import Root
from snowflake.core.task import Task

tasks = root.databases["dw_db"].schemas["dw_schema"].tasks
mytask = tasks["midnight_run"]
mytask.resume()
Enter fullscreen mode Exit fullscreen mode

A Real Use-Case

Let’s take a real-world example and understand how to use Snowflake API with Python. The example consists of two parts. The first part demonstrated how to use Python to deploy Snowflake resources like Warehouse, Database, Schema, Role, etc and the second part demonstrated how to ingest data in the Snowflake table and then do some analysis.

Deploy Resources

In this example, we will understand how to organize the setup based on different environments. This will help deploy the required resources in the intended environment.

In the connections.toml file, create the profiles based on environments, e.g. dev, test, prod, etc. We will keep only three parameters in the connections file, viz. account, user, and password. The credentials used here are of our master/admin account. Here is how the file looks like

connections.toml

[dev]

account = “xxxxxx-xxxxxxx”

user = “xxxxx”

password = “xxxxxx”

Next we will create a config file. Let’s call it config.ini. We will use this to store environment-specific resources.

config.ini

[dev]

warehouse = dev_wh

database = sales_db

schema = sales_raw

Make sure that profile name defined in connections.toml matches the profile name defined in config.ini. You can create multiple profiles based on the total environments you want to deploy resources to.

Now we create a python script called ‘setup_resources.py’. As the name suggests this script will deploy snowflake resources into the specific environment. Below is the breakdown of this script.

setup_resources.py

We start with importing required libraries:

from snowflake.core import Root, CreateMode
from snowflake.core.warehouse import Warehouse
from snowflake.core.database import Database
from snowflake.core.schema import Schema
from snowflake.snowpark import Session
import configparser
import os
Enter fullscreen mode Exit fullscreen mode

Let’s create a function to load the config.ini file. We need to pass an environment argument to load the config for a specific environment profile. We will use Snowpark

def load_config(env):
   config = configparser.ConfigParser()
   config.read('config.ini')
   return config[env]
Enter fullscreen mode Exit fullscreen mode

Then, a function to deploy the resources.

def setup_resources(env):
   config = load_config(env)
   session = Session.builder.config("connection_name", env).create()
   root = Root(session)

   # create warehouse
   try:
       wh = Warehouse(name=config['warehouse'])
       warehouses = root.warehouses
       warehouses.create(wh, mode=CreateMode.or_replace)
       print(f"Warehouse {config['warehouse']} created")

   except AttributeError:
       print(f"Failed to create warehouse {config['warehouse']}")

   # create database
   try:
       db = Database(name=config['database'])
       root.databases.create(db, mode=CreateMode.or_replace )
       print(f"Database {config['database']} created")
   except AttributeError:
       print(f"Failed to create database {config['database']}")

   # create schema
   try:
       schema = Schema(name=config['schema'])
 root.databases[config['database']].schemas.create(schema,mode=CreateMode.or_replace)
       print(f"Schema {config['schema']} created")
   except AttributeError:
       print(f"Failed to create schema {config['schema']}")
Enter fullscreen mode Exit fullscreen mode

The above function loads the config and connection profiles from respective files and creates the Snowflake-specific resources like Warehouse, Database, Schema, and Role and grants appropriate privileges to the role.

Let’s define a main function and pass the environment as an argument. If no environment variable is defined, then dev can be used as the default value.

if __name__ == " __main__":
   env = os.getenv('ENV', 'dev')
   setup_resources(env)
Enter fullscreen mode Exit fullscreen mode

Run the script using the command python setup_resources.py and if all goes well, you’ll see the output similar to that shown below:

Warehouse dev_wh created
Database sales_db created
Schema sales_raw created
Enter fullscreen mode Exit fullscreen mode

The second part of this demo is to interact with the Snowflake table to ingest the data and then read the data for some analysis. Let’s create another file and call it data_operations.py

data_operations.py

Start by importing the required libraries and loading the config file. We will be using Snowflake Python connector for this part.

import snowflake.connector
import configparser
import os

def load_config(env):
   config = configparser.ConfigParser()
   config.read('config.ini')
   return config[env]
Enter fullscreen mode Exit fullscreen mode

Let’s define a function to ingest the data

def insert_data(env):
   config = load_config(env)
   with snowflake.connector.connect(connection_name=env) as conn:
       cur = conn.cursor()

       # Set the context using the custom role and other resources
       cur.execute(f"USE ROLE {config['role']}")
       cur.execute(f"USE WAREHOUSE {config['warehouse']}")
       cur.execute(f"USE DATABASE {config['database']}")
       cur.execute(f"USE SCHEMA {config['schema']}")

       # Create a table and insert data
       cur.execute("""
           CREATE OR REPLACE TABLE sales_data (
               date DATE,
               product_name VARCHAR,
               sales_amount FLOAT,
               region VARCHAR
           )
       """)
       cur.execute("""
           INSERT INTO sales_data (date, product_name, sales_amount, region)
           VALUES
           ('2023-01-01', 'Widget', 100, 'North America'),
           ('2023-01-01', 'Gadget', 150, 'Europe')
       """)
       cur.close()
Enter fullscreen mode Exit fullscreen mode

Finally, load the data and do some analysis.

def query_data(env):
   config = load_config(env)
   with snowflake.connector.connect(connection_name=env) as conn:
       cur = conn.cursor()

       cur.execute(f"USE ROLE {config['role']}")
       cur.execute(f"USE WAREHOUSE {config['warehouse']}")
       cur.execute(f"USE DATABASE {config['database']}")
       cur.execute(f"USE SCHEMA {config['schema']}")

       # Execute a query to analyze sales data
       cur.execute("""
           SELECT product_name, SUM(sales_amount) AS total_sales
           FROM sales_data
           GROUP BY product_name
           ORDER BY total_sales DESC
       """)
       results = cur.fetchall()
       print("Sales Analysis Results:")
       for row in results:
           print(f"Product: {row[0]}, Total Sales: {row[1]}")

       cur.close()

if __name__ == " __main__":
   env = os.getenv('ENV', 'dev')
   insert_data(env)
   query_data(env)
Enter fullscreen mode Exit fullscreen mode

Run the script using the command python data_operations.py, and you’ll see output similar to what is shown below:

Sales Analysis Results:
Product: Gadget, Total Sales: 150.0
Product: Widget, Total Sales: 100.0
Enter fullscreen mode Exit fullscreen mode

As you can see from the above explanation, the structuring and mapping of config and connection profiles properly helps a lot in streamlining not only the deployment process but also ingestion and analytical operations.

Conclusion

So what is the end game here? Well, I think up until now, many of us were setting up our Snowflake environments with either SQL, Terraform, Snow CLI or some other tool, but the majority seemed to fall under SQL or Terraform. However, there are many of us who are using CDK with Python to run our Cloud Environments and would prefer to use Python to deploy everything in Snowflake, including, for example, that you have a SaaS service that requires some Snowflake deployments when a new client comes on board for example.

Snowflake Python API provides a solid option for anyone wishing to have everything in one Language, and it clearly can do things well. Adding that, even dbt supports Python as well, so even your transformations and modelling can be fully done in Python.

Give it a try by doing the quickstart lab below.

I am Augusto Rosa, VP of Engineering for Infostrux Solutions. Snowflake Data Super Hero and SME. Thanks for reading my blog post. You can follow me on LinkedIn.

Subscribe to Infostrux Medium Blogs https://medium.com/infostrux-solutions for the most interesting Data Engineering and Snowflake news.

Sources:


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .