Getting Started with Llama 3.3 on Ubuntu Linux with Ollama

W. M. Thor

By Wei Ming T. on Dec 9, 2024

Llama 3.3 brings multilingual dialogue capabilities, rivaling larger models like Llama 3.1 (405B) on many benchmarks. This model is "compact" and efficient but requires a capable workstation for optimal performance. This guide covers everything you need to set up and run Llama 3.3 on Ubuntu Linux with Ollama.

System Requirements

Operating System

  • Ubuntu 20.04 or later

Hardware

  • Recommended Base System: 8-Core CPU or better, 32GB RAM
  • Storage: At least 100GB free space for model storage

GPU Variants and Requirements

Variant VRAM Recommended Hardware Best Use Case
latest 43GB NVIDIA A5000/A6000 Production environments requiring latest features
70b 43GB NVIDIA A5000/A6000 General purpose usage
70b-instruct-fp16 141GB Multi-GPU with NVLink Research requiring maximum precision
70b-instruct-q2_K 26GB NVIDIA RTX 3090/4090 Home users, basic inference
70b-instruct-q3_K_M 34GB NVIDIA A5000/A6000 Production deployments
70b-instruct-q3_K_S 31GB NVIDIA RTX 4090/A5000 Balanced performance/quality
70b-instruct-q4_0 40GB NVIDIA A6000 Higher quality inference
70b-instruct-q4_1 44GB NVIDIA A6000 High-quality inference
70b-instruct-q4_K_M 43GB NVIDIA A6000 Production quality inference
70b-instruct-q4_K_S 40GB NVIDIA A6000 Balanced inference speed/quality
70b-instruct-q5_0 49GB NVIDIA A6000 Near FP16 quality
70b-instruct-q5_1 53GB NVIDIA A100 High-precision inference
70b-instruct-q5_K_M 50GB NVIDIA A6000/A100 Production quality inference
70b-instruct-q6_K 58GB NVIDIA A100 High-precision inference
70b-instruct-q8_0 75GB Multiple A100s Maximum quality inference

Installation and Setup

Step 1: Installing Ollama

Quick Install (Recommended)

Run the following command to install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Verify the installation:

ollama --version

Manual Install

For manual installation:

  1. Download and extract the package:
    curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
    sudo tar -C /usr -xzf ollama-linux-amd64.tgz
    
  2. Start Ollama:
    ollama serve
    
  3. Verify it is running:
    ollama -v
    

AMD GPU Support

For systems with AMD GPUs, download and install the ROCm package:

curl -L https://ollama.com/download/ollama-linux-amd64-rocm.tgz -o ollama-linux-amd64-rocm.tgz
sudo tar -C /usr -xzf ollama-linux-amd64-rocm.tgz

ARM64 Install

For ARM64 architectures, use this package:

curl -L https://ollama.com/download/ollama-linux-arm64.tgz -o ollama-linux-arm64.tgz
sudo tar -C /usr -xzf ollama-linux-arm64.tgz

Step 2: Downloading Llama 3.3

Download the Llama 3.3 model using Ollama's pull command:

ollama pull llama3.3

To download specific variants optimized for your GPU:

ollama pull llama3.3-70b-instruct-q3_K_M

Step 3: Running Llama 3.3

Start using the model interactively:

ollama run llama3.3

Example interaction:

User: What is the capital of Japan?
Assistant: The capital of Japan is Tokyo.

Step 4: Serving the Model

Starting Ollama Server

To run Ollama as a server without the desktop application:

ollama serve

For development builds:

./ollama serve

Then, in a separate shell, run a model:

./ollama run llama3.3

Using the REST API

Ollama provides a REST API for running and managing models. Here are common usage examples to make a request to the served model:

Generate a Response
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.3",
  "prompt": "Why is the sky blue?"
}'
Chat with the Model
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.3",
  "messages": [
    { "role": "user", "content": "Why is the sky blue?" }
  ]
}'

For a complete list of API endpoints and options, refer to the Ollama API documentation.

Troubleshooting

Common Issues

  • Installation Issues: Ensure curl is installed:

    sudo apt update && sudo apt install curl -y
    
  • Performance Lags: Use quantized models (e.g., q3_K_M) or a GPU-enabled system.

  • Service Logs: Check logs:

    journalctl -e -u ollama
    

With these detailed instructions, you can confidently install and customize Llama 3.3 for your hardware and use case. Ollama simplifies deployment, allowing you to focus on leveraging Llama 3.3's powerful capabilities for multilingual dialogue and beyond.

© 2024 ApX Machine Learning. All rights reserved.