YugabyteDB in Jupyter Notebook in Google Colab

Franck Pachot - Oct 3 '23 - - Dev Community

When you want to play with a Database, especially when connecting with Python, Jupyter notebooks are quite handy. Google Colab even provides a runtime environment for free with 12.7 GB RAM and 107.7 GB disk, which is sufficient to run YugabyteDB especially when started with yugabyted.

Here is an example of such Notebook, you can look at it and run it:

Here is the iPython cell I use to download and install the latest preview version of YugabyteDB:

# Install YugabyteDB if not already there (get the latest preview version from docs.yugabyte.com)
! [ -f ./yugabyte/bin/yugabyted ] || { \
  rm -rf yugabyte-* && \
  apt install gawk  && \
  tgz=$(curl -Ls https://docs.yugabyte.com/preview/quick-start/linux/ | awk '$0~re{print gensub(re,"\\1",1,$0)}' re="^.*wget (.*$(uname -m)[.]tar[.]gz)") && \
  wget -O yugabyte.tar.gz "$tgz" && \
  tar xfz yugabyte.tar.gz && \
  rm -rf yugabyte.tar.gz && \
  mv ./yugabyte-* yugabyte && \
  ./yugabyte/bin/post_install.sh >/dev/null 2>&1 ; \
  }
! [ -f ./yugabyte/bin/yugabyted ] || echo YugabyteDB is installed
! ./yugabyte/bin/yugabyted status
Enter fullscreen mode Exit fullscreen mode

Here is the iPython cell I use to start it (on same port as the default PostgreSQL one):

%env PGHOST=127.0.0.1
%env PGPORT=5432
# Start YugabyteDB
! ./yugabyte/bin/yugabyted start --advertise_address=$PGHOST --ysql_port=$PGPORT & \
  echo "Starting in the background because it seems iPython doesn't detect when done..."
Enter fullscreen mode Exit fullscreen mode

Here is the cell I use to wait for the PostgreSQL enpoint to be available:

# Wait that the PostgreSQL compatible endpoint accepts connections (PGHOST and PGPORT are set)
! until ./yugabyte/postgres/bin/pg_isready ; do sleep 1 ; done | uniq ; ./yugabyte/bin/yugabyted status
Enter fullscreen mode Exit fullscreen mode

You can run ./yugabyte/bin/ysqlsh which is the fork of psql with the same features. You can use psycopg2, SQLAlchemy, iPython SQL Magic, as you would do with a PostgreSQL database

The status of the YugabyteDB cluster is also available from the http console. Here is an example listing the servers:

pandas.read_html('http://localhost:7000/tablet-servers')[0].replace(":9000 [0-9a-f]{32}","",regex=True)
Enter fullscreen mode Exit fullscreen mode

Image description

The SQL magic %%sql is a convenient way to run simple SQL, connecting to YugabyteDB and running PostgreSQL queries:
Image description

