Unleashing the Power of AI: Running Large Language Models on Your Own Cloud Server (Digital Ocean)

Sonam Choeda - Oct 30 - - Dev Community

Introduction

In the rapidly evolving world of artificial intelligence, large language models (LLMs) have become increasingly accessible to developers and enthusiasts. This blog post will guide you through the process of setting up and running an LLM, specifically Ollama, on a cloud-based Linux server using DigitalOcean’s Droplets.

Why Run Your Own LLM?

Running your own LLM offers several advantages:

  • Complete control over your AI model
  • Enhanced privacy and data security
  • Customization possibilities
  • Cost-effective for long-term use

Setting Up Your Cloud Server

We’ll be using DigitalOcean’s Droplets for this tutorial. Here’s a quick overview of the setup process:

  1. Create a DigitalOcean account(https://www.digitalocean.com/)
  2. Choose a Droplet configuration (Ubuntu recommended)
  3. Select appropriate resources (8GB RAM minimum for most LLMs)
  4. Set up authentication (password or SSH key)
  5. Launch your Droplet.

Connecting to Your Droplet

Once your Droplet is running, you’ll need to connect to it via SSH. Use the following command in your terminal:

ssh root@your_droplet_ip_address
Enter fullscreen mode Exit fullscreen mode

Installing Ollama

Ollama is an easy-to-use framework for running LLMs. To install it, run this command:

curl -fsSL <https://ollama.com/install.sh> | sh
Enter fullscreen mode Exit fullscreen mode

Running Your First LLM

With Ollama installed, you can now run an LLM. For example, to run the Llama2 model:

ollama run llama2
Enter fullscreen mode Exit fullscreen mode

Interacting with Your LLM

Once the model is loaded, you can start interacting with it by typing prompts. For example:

“Why is the sky blue?”

Running Ollama as a Server

By default, Ollama runs as a server on port 11434. You can access it at http://localhost:11434. To keep Ollama running continuously on your server, even after you’ve logged out, you can use a process manager like PM2. Here’s how to set it up:

  1. Install PM2 if you haven’t already:
npm install pm2 -g
Enter fullscreen mode Exit fullscreen mode
  1. After installing PM2, we can run the ollama server
pm2 start "ollama serve" -n <name>
Enter fullscreen mode Exit fullscreen mode
  1. Ensure PM2 starts on system reboot
pm2 startup systemd
pm2 save
Enter fullscreen mode Exit fullscreen mode

Now Ollama will run continuously as a server, allowing you to interact with it even after closing your SSH session.

Conclusion

Setting up and running your own LLM on a cloud server opens up a world of possibilities for AI experimentation and development. As you become more comfortable with the process, you can explore different models, fine-tune them for specific tasks, or even create your own AI-powered applications.

Next Steps

Consider exploring:

  • Different LLM models available through Ollama
  • Fine-tuning models for specific use cases
  • Integrating your LLM into other applications or services
. . .