How to Set Up and Run Ollama on a GPU-Powered VM (vast.ai)
In this tutorial, we'll walk you through the process of setting up and using Ollama for private model inference on a VM with GPU, either on your local machine or a rented VM from Vast.ai or Runpod. Ollama allows you to run models privately, ensuring data security and faster inference times thanks to the power of GPUs. By leveraging a GPU-powered VM, you can significantly improve the performance and efficiency of your model inference tasks.
Outline
- Set up a VM with GPU on Vast.ai
- Start Jupyter Terminal
- Install Ollama
- Run Ollama Serve
- Test Ollama with a model
- (Optional) using your own model
Setting Up a VM with GPU on Vast.ai
1. Create a VM with GPU:
- Visit Vast.ai to create your VM.
- Choose a VM with at least 30 GB of storage to accommodate the models. This ensures you have enough space for installation and model storage.
- Select a VM that costs less than $0.30 per hour to keep the setup cost-effective.
2. Start Jupyter Terminal:
- Once your VM is up and running, start Jupyter and open a terminal within it.
Downloading and Running Ollama
- Start Jupyter Terminal:
- Once your VM is up and running, start Jupyter and open a terminal within it. This is the easiest method to get started.
- Alternatively, you can use SSH on your local VM, for example with VSCode, but you will need to create an SSH key to use it.
- Install Ollama:
- Open the terminal in Jupyter and run the following command to install Ollama:
bash curl -fsSL https://ollama.com/install.sh | sh
2. Run Ollama Serve:
- After installation, start the Ollama service by running:
bash ollama serve &
Ensure there are no GPU errors. If there are issues, the response will be slow when interacting with the model.
3. Test Ollama with a Model:
- Test the setup by running a sample model like Mistral:
bash ollama run mistral
You can now start chatting with the model to ensure everything is working correctly.
For the complete guide and more details, you can read the full article here.