Configuring Google Cloud Storage for Mage

Cris Crawford - Jan 27 - - Dev Community

In this post I'll talk about how I configured Google Cloud Storage so that Mage could transfer data to a Google Cloud Storage Bucket. First I created the bucket. I went to cloud console (console.cloud.google.com), and in the menu on the left (hamburger menu in the top left), I selected Cloud Storage->Bucket.

To create a bucket, I clicked on the CREATE button. I had to give my bucket a globally unique name. I named it mage-zoomcamp-cris-crawford. Nobody else had used that name. I hit return. You could click "CONTINUE" to see the default settings, which are fine for this bucket. A popup menu appeared titled "Public access will be prevented". I clicked "CONFIRM".

Next, you should create a service account. I'd already done this for terraform, but I'll describe the steps anyway. In the left menu, go to the menu and choose IAM & Admin->Service Accounts. Choose a name for your service account. This name doesn't have to be unique. Click "CREATE AND CONTINUE". Now set roles for these permissions. For now we'll be generous in setting permissions. Choose the role "Owner". Click "CONTINUE" and "DONE".

Now we need to have a key for authorization. Click on the key tab. Click on "ADD KEY" and select "Create new key". Choose json payload in the popup window (the default). This will download a file to your computer. You now want to copy these credentials to your mage project, mage-zoomcamp, which for me is on my VI instance. I already had my key in a directory on my VI instance called ~/.gc, but I needed to copy the json file to mage-zoomcamp in order for the docker-compose to make it available. You can use sftp on your computer to copy the key to your VI instance.

Now that Mage had my service account key, I could use Mage to copy files to my google cloud storage bucket. I opened Mage on localhost:6789. (I had this port mapped in my port settings on VSCode, which should be connected to the VI instance.) I navigated to "Files" in the left side menu and opened io_config.yaml. There are two ways to set up Google cloud credentials. The first is just to paste the contents of the file into io_config.yaml, and the second, which I used, was to copy the path of the key file. I deleted the first bunch of code ("GOOGLE_SERVICE_ACC_KEY" and everything contained therein), and entered the path to my key in "GOOGLE_SERVICE_ACC_KEY_FILEPATH". I typed "/home/src/keys.json". It's necessary to use /home/src/filename.json, because that's where the files are mapped in the Mage docker container.

Now Mage will use this key anytime it wants to read or write data to Google cloud storage.

I used the pipeline test_config from before to test this. I selected "pipelines" from the left side menu and chose "test_config", and edited it. I changed the connection to BigQuery and the profile to "default", and ran it. I could see a message appear the BigQuery was reached. This means Mage was able to access Google using the service account key.

To test Google cloud storage, I used "example_pipeline". I opened it up in the editor and ran all the blocks. This put a "clean" version of the Titanic database into my home directory. Now I had to upload the titanic_clean.csv into my bucket on Google cloud. I could not drag and drop the way he did on the video. I had to ask ChatGPT how to do this. I used the gcloud sdk, and typed gcloud auth login (which may not have been necessary) from the VM instance. After following all the instructions, I logged into gcloud and ran gsutil cp titanic_clean.csv gs://[MY_BUCKET_NAME]/, and the file appeared in my bucket.

I went back to the pipeline "test_config" and deleted the Load data block. I opened another Load data block and chose Python->Google Cloud Storage for the template. I called it test_gcs. I edited the template to add my bucket name as the bucket_name and the .csv file as the object_key. I was able to run the file and see the data appear, fetched from the Google cloud storage bucket.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .