How to Run DeepSeek V3

By Ryan A. on Jan 6, 2025

Guest Author

DeepSeek V3 is a state-of-the-art Mixture-of-Experts (MoE) model boasting 671 billion parameters. It excels in tasks like reasoning, code generation, and multilingual support, making it one of the top-performing open-source AI solutions. This guide details the deployment process for DeepSeek V3, emphasizing optimal hardware configurations and tools like ollama for easier setup.

Why Choose DeepSeek V3?

Key features that make DeepSeek V3 stand out:

  • Auxiliary-Loss-Free Strategy: Ensures balanced load distribution without sacrificing performance.
  • Multi-Token Prediction (MTP): Boosts inference efficiency and speed.
  • FP8 Precision Training: Provides cost-effective scalability for large-scale models.
  • Framework Flexibility: Compatible with multiple hardware and software stacks.

Deploying DeepSeek V3 locally provides complete control over its performance and maximizes hardware investments.

Updated System Requirements (Full Base Model)

Hardware

To deploy the full base model of DeepSeek V3 efficiently, use the following configurations:

  • GPU:
    • Minimum: NVIDIA A100 (80GB) with FP8/BF16 precision support.
    • Recommended: NVIDIA H100 80GB GPUs (16x or more) for distributed setups.
    • Alternatives:
      • AMD GPUs supporting FP8/BF16 (via frameworks like SGLang).
      • Huawei Ascend NPUs with BF16 support.
  • CPU: Multi-core processors for pre/post-processing tasks.
  • Memory:
    • Minimum: 64GB RAM.
    • Recommended: 128GB RAM for larger datasets or multi-GPU configurations.
  • Storage: At least 1TB of high-speed SSD or NVMe storage for model weights and intermediate files.

Software

  • Operating System: Linux-based OS (Ubuntu 20.04+ recommended).
  • Python Version: Python 3.8 or higher.
  • Essential Libraries:
    • PyTorch (torch >= 1.12.0)
    • Transformers, huggingface_hub, numpy, scipy, scikit-learn

For the full list of system requirements, including the distilled models, visit the system requirements guide.

Deployment Steps

Quick Setup Using Ollama

For the simplest deployment, use ollama. Ensure ollama is installed on your system, then start the model with a single command:

ollama run deepseek-v3

This command launches an interactive session, enabling you to interact with the model without needing to configure complex setups.

Commands for Distilled Models

Run smaller, distilled versions of the model that have more modest GPU requirements.

  • 1.5B version:
    ollama run deepseek-r1:1.5b
    
  • 8B version:
    ollama run deepseek-r1:8b
    
  • 14B version:
    ollama run deepseek-r1:14b
    
  • 32B version:
    ollama run deepseek-r1:32b
    
  • 70B version:
    ollama run deepseek-r1:70b
    

Advanced Deployment Steps

1. Clone the Repository

Clone the official DeepSeek V3 repository:

git clone https://github.com/deepseek-ai/DeepSeek-V3.git
cd DeepSeek-V3

2. Install Dependencies

Navigate to the inference folder and install required dependencies:

cd inference

# Optional: Isolate dependencies
virtualenv env
source env/bin/activate

pip install -r requirements.txt

3. Download Model Weights

Download the weights from Hugging Face:

mkdir -p /path/to/DeepSeek-V3-Demo
# Save model weights to the directory above.

4. Alternative Inference Commands

For advanced inference scenarios, use distributed PyTorch commands:

Interactive Inference
torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py   --ckpt-path /path/to/DeepSeek-V3-Demo   --config configs/config_671B.json   --interactive   --temperature 0.7   --max-new-tokens 200
Batch Inference
torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py   --ckpt-path /path/to/DeepSeek-V3-Demo   --config configs/config_671B.json   --input-file $FILE

Frameworks for Enhanced Deployment

SGLang

A specialized framework for MoE models like DeepSeek V3, offering:

  • FP8 precision.
  • Distributed training.
  • Advanced Multi-Token Prediction (MTP).

Detailed guide here.

LMDeploy

A versatile inference framework supporting FP8 and BF16 precision, ideal for scaling DeepSeek V3.

Setup guide here.

NVIDIA TensorRT-LLM

Optimize your deployment with TensorRT-LLM, featuring quantization and precision tuning (BF16 and INT4/INT8).

Learn more here.

Key Tips for Optimal Performance

  1. Use FP8 Precision: Maximize efficiency for both training and inference.
  2. Deploy on Distributed Systems: Use frameworks like TensorRT-LLM or SGLang for multi-node setups.
  3. Monitor Resources: Leverage tools like nvidia-smi for real-time utilization tracking.

Conclusion

Deploying DeepSeek V3 is now more streamlined than ever, thanks to tools like ollama and frameworks such as TensorRT-LLM and SGLang. By leveraging high-end GPUs like the NVIDIA H100 and following this guide, you can unlock the full potential of this powerful MoE model for your AI workloads.

© 2025 ApX Machine Learning. All rights reserved.

AutoML Platform

Beta
  • Early access to high-performance ML infrastructure
  • Be first to leverage distributed training
  • Shape the future of no-code ML development
Join Beta Program

LangML Suite

Coming Soon
  • Priority access to enterprise LLM infrastructure
  • Be among first to test RAG optimization
  • Exclusive early access to fine-tuning suite
Register Interest