How to Run DeepSeek V3

W. M. Thor

By Ryan A. on Jan 6, 2025

DeepSeek V3 is a state-of-the-art Mixture-of-Experts (MoE) model designed for scalable and efficient inference, featuring 671 billion parameters. Its performance benchmarks place it at the forefront of open-source AI solutions, especially in tasks requiring advanced reasoning, code generation, and multilingual support. This guide provides a detailed walkthrough for deploying DeepSeek V3 on high-end hardware.

Why DeepSeek V3?

DeepSeek V3 sets itself apart with features like:

  • Auxiliary-Loss-Free Strategy: Improves load balancing without performance degradation.
  • Multi-Token Prediction (MTP): Enhances both performance and inference speed.
  • Innovative Training: Utilizes FP8 precision to achieve cost-efficient, large-scale training.
  • Flexibility: Supports multiple frameworks and hardware configurations.

Running DeepSeek V3 locally gives you full control over the model’s performance and allows you to leverage your hardware investments efficiently.

System Requirements

Before diving into the setup, ensure your system meets the following requirements:

Hardware Requirements

  • GPU:
    • Minimum: NVIDIA A100 (80GB) for FP8/BF16 precision.
    • Recommended: NVIDIA H800 GPUs for distributed, multi-node setups.
    • Alternatives: AMD GPUs with FP8/BF16 support (via SGLang) or Huawei Ascend NPUs (BF16 precision).
  • CPU: A modern multi-core processor to handle pre/post-processing tasks.
  • Memory:
    • Minimum: 64GB RAM.
    • Recommended: 128GB RAM or more for multi-GPU setups.
  • Storage: At least 1TB SSD or NVMe storage for model weights and data.

Software Requirements

  • Operating System: Linux-based OS, preferably Ubuntu 20.04 or newer.
  • Python Version: Python 3.8 or higher.
  • Key Libraries:
    • PyTorch (torch >= 1.12.0)
    • Transformers, huggingface_hub, numpy, scipy, scikit-learn

Step-by-Step Setup

1. Clone the Repository

Start by cloning the official DeepSeek V3 repository from GitHub:

git clone https://github.com/deepseek-ai/DeepSeek-V3.git
cd DeepSeek-V3

2. Install Dependencies

Navigate to the inference folder and install the required Python libraries:

cd inference
pip install -r requirements.txt

3. Download Model Weights

Download the model weights from Hugging Face and save them to a dedicated directory:

mkdir -p /path/to/DeepSeek-V3-Demo
# Download the weights into the above directory.

4. Convert Model Weights

If your setup requires converting the weights to a specific format:

python convert.py --hf-ckpt-path /path/to/DeepSeek-V3                   --save-path /path/to/DeepSeek-V3-Demo                   --n-experts 256 --model-parallel 16

5. Run Inference

For interactive inference, use the following command:

torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py   --ckpt-path /path/to/DeepSeek-V3-Demo   --config configs/config_671B.json   --interactive   --temperature 0.7   --max-new-tokens 200

For batch inference using a file:

torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py   --ckpt-path /path/to/DeepSeek-V3-Demo   --config configs/config_671B.json   --input-file $FILE

Alternative Frameworks for Deployment

SGLang

SGLang is optimized for DeepSeek V3 with support for FP8 precision, distributed parallelism, and Multi-Token Prediction (MTP). It is compatible with both NVIDIA and AMD GPUs.

LMDeploy

LMDeploy is a versatile framework for efficient inference, offering FP8 and BF16 precision options.

TensorRT-LLM

NVIDIA TensorRT-LLM supports BF16 and INT4/INT8 quantization. FP8 support is in progress.

Key Tips for Optimal Performance

  1. Utilize FP8 Precision: FP8 training and inference deliver excellent efficiency with minimal overhead.
  2. Deploy Multi-Node Setups: For larger models or high throughput, consider using frameworks like SGLang or TensorRT-LLM for distributed setups.
  3. Monitor Resource Usage: Tools like nvidia-smi can help ensure balanced GPU and CPU utilization.

© 2025 ApX Machine Learning. All rights reserved.