Python and Jupyter Notebooks

Rose Day - Mar 28 '18 - - Dev Community

Recently I have began to use Jupyter notebooks with Python but have struggled with the constant need to download dependencies or have something not download correctly. Seeing this as a continuing trend, and wanting the portability between computers for the development environment, I turned to learning how Docker works.

Working on a 2.8 GHz Intel Core i7 processor, I began researching different methods of setting up a Docker environment on this computer along with any other I wanted to switch to at a later date. I found two methods to set up Intel Python in Docker using Jupter Notebooks. When setting up the Intel Distribution of Python, I used Jupyter Notebooks as the front end for code, equations, and visualizations. This is what I am currently using for classes and find that it works great when needing to share code between team members.

To set this up, like mentioned, I wanted to use Docker, which allows for containerization of the notebooks in order to package and run applications. By using Docker, this allows for an easily transferable environment to code in. When using Docker to set up Jupyter notebooks for the Python distribution, it is possible to use the already prepared image or to use an image as a base when customizing your own. Below I look at both ways to set up a Docker image for Intel Python on Jupyter notebooks.

Docker Image

The Intel distribution has both Python 2 and Python 3 images in Docker with core or full configurations. The core configurations contain NumPy/SciPy with dependencies while full contains everything that Intel distributes. For my purposes I used the full version of Intel Python 2.

To get started using a Docker image with Jupyter notebooks, I downloaded the image I wanted from Docker Hub and set up a volume to use with the image. The volume is an optional addition when using a Docker container but it allows for persistent data. I used a volume in this instance because it was the place I stored all the notebooks I wanted to run. When the container is no longer running, data doesn't persist and having data only available in the container can make it difficult to get out when another process requires it. Therefore, I created a volume to use on the host machine for later use with the container. To set up this Docker container, I followed the steps below:

  1. Download the Docker image from Docker Hub.
  2. Set up a folder to act as a volume for Docker, ~/Documents/notebooks was set up on the computer and attached to /home/notebooks in the Jupyter notebooks container. This allows for files to be easily accessible and version controlled after closing down the notebook.
  3. Open a terminal and run the notebook.
# Pull image 
docker pull intelpython/intelpython2_full

# Set up folder 
mkdir ~/Documents/notebooks/ 

# Run the notebook 
docker run -v ~/Documents/notebooks:/home/notebooks -p 8888:8888 intelpython/intelpython2_full jupyter notebook --ip='*' --port=8888 --allow-root --no-browser 
Enter fullscreen mode Exit fullscreen mode

This may work for many applications but this is where I ran into a problem. When working on the code I was running in Jupyter notebook there was a call to seaborn which is used in Python for visualizations based on matplotlib. This library is used to create more attractive statistical graphics in Python. Using the full image of Intel Python from Docker Hub doesn't provide the needed libraries. With this, I worked to customize the Docker image using a Dockerfile to add in seaborn.

Dockerfile for Customization

To create a customized Docker image based on Intel Python that can be run in Jupyter notebooks I set up a Dockerfile with based on the Docker Hub Dockerfile's from Intel Python. With this, continuumio/miniconda is used as the base image to work from. This is because Anaconda is a platform powered by Python that contains the most popular data scinece packages for Python and R. These packages can then be installed with the conda dependency and environment manager. By using this image, all needed packages not included in Intel Python can be then installed with conda when creating the customized image.

 

# Set the base image using miniconda 
FROM continuumio/miniconda3:4.3.27

# Add metadata
LABEL version="1.0" \
      description="Intel Python 2 using Jupyter Notebooks" \
      date_created="01march2018" \
      date_modified="28march2018"
Enter fullscreen mode Exit fullscreen mode

With this, the environmental vairable ACCEPT_INTEL_PYTHON_EULA is set to 'yes' with the command ENV. This is the acceptance of the End_User License Agreement (EULA) for Intel Python which needs to be accepted everytime a new environment is created. After setting this variable the RUN command can be used to execute shell commands in a new layer. Each time this command is executed a new layer is created. Using this command, conda can be used to install Intel Python, seaborn, and any other data science libraries you may need or want. Then apt-get is used to update and then install g++. After configuring a custom image, it can now be built and run for use.

# Set environmental variable(s)
ENV ACCEPT_INTEL_PYTHON_EULA=yes

# Installs, clean, and update    
RUN conda config --add channels intel\
    && conda install  -y -q intelpython2_full=2018.0.1 python=2 \
    && conda install seaborn \ 
    && apt-get clean \
    && apt-get update -qqq \
    && apt-get install -y -q g++
Enter fullscreen mode Exit fullscreen mode

Build an Image

After completing the Dockerfile, check that you are in the correct location on command line before running commands. I have often found myself in the wrong directory when I go to look at something else first, before coming back to build an image.

$ ls
Dockerfile
Enter fullscreen mode Exit fullscreen mode

Then, to build the image, run the build command with a tag, -t, for the image. This tag gives in an easy to use name to the image, I called mine test_intel to be able to pick it out of a list quick. This may take a few minutes to build the image.

docker build -t test_intel .
Enter fullscreen mode Exit fullscreen mode

Run an Image

After the image is built, you can check Dockers image registry on your local machine to see the image in the list. When running this command, a list will appear to show you the repository name, tag, image ID, time created, and size of the image like the example shown below. This is a good check to make sure the image built before moving forward.

docker image ls
REPOSITORY        TAG      IMAGE ID        SIZE
test_intel        latest   ce5d8aa2966d    6.52GB
Enter fullscreen mode Exit fullscreen mode

Once complete, it is time to run the image. Running the image works similar to the first example of setting up the core or full Docker image without customizations. To run this command, replace the image name with the new image you have just created in previous steps, test_intel.

docker run -v ~/Documents/notebooks:/home/notebooks -p 8888:8888 test_intel jupyter notebook --ip='*' --port=8888 --allow-root --no-browser
Enter fullscreen mode Exit fullscreen mode

After running this command in the terminal, a URL should appear for you to copy and paste into the browser to connect to Jupyter notebook with the Intel Python distribution now installed and ready to go. Once connected, you can begin using your customized environment. To shut down the server and all kernels, use Control-C in terminal.

References

Intel Optimized Packages for the Intel Distribution for Python
Docker
seaborn
miniconda
Cover image sourced from Docker Wallpapers

. . . . . . . . . . . . . . . . .