Just launched! No big plans. Just sharing our AI/ML research and practical tips.

Follow on LinkedIn

How to Run DeepSeek V3-0324: Updated Weights

By Ryan A. on Mar 25, 2025

Guest Author

DeepSeek V3–0324 is the updated checkpoint for DeepSeek's 685B-parameter MoE (Mixture of Experts) model, originally released in December 2024. This new version, tagged with "0324" after its release date (March 24, 2024), improves coding performance while maintaining the same architectural structure and model size.

The model is open-source and hosted on HuggingFace:
https://huggingface.co/deepseek-ai/DeepSeek-V3-0324

The system requirements remain unchanged since it's a checkpoint update and not a new architecture.

System Requirements

The VRAM requirements are consistent with the original DeepSeek V3 release. For completeness, here are the relevant links:

Model Version Minimum VRAM Requirement
Full Model ~1532 GB
4-bit Model ~386 GB

These are baseline requirements to load the full model. For stable inference or use with extended context, higher memory is recommended.

Multi-GPU Setup

Due to its size, running the full DeepSeek V3 model on a single machine is generally impractical. In most environments, it will require multiple high-memory GPUs. Libraries like accelerate, deep speed, or transformers with device_map="auto" support partitioning the model across devices.

If you do have a distributed setting, then you can run the model using:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "deepseek-ai/DeepSeek-V3-0324"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)

For more control over placement, you can use Accelerate or Deepspeed to offload layers across GPUs. Ensure each GPU has enough headroom for the assigned layers plus activation memory.

Running the 4-bit Model on Mac (MLX)

For developers using Apple Silicon, particularly Mac Studio M3 Ultra (512GB), the 4-bit quantized model offers a more accessible alternative. This version was converted using mlx-lm 0.22.2.

Install the library:

pip install mlx-lm

Run with:

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/DeepSeek-V3-0324-4bit")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Performance can be quite usable if system RAM is sufficient and swap usage is managed properly.

Conclusion

DeepSeek V3-0324 delivers better coding performance through updated weights while keeping the same system setup as earlier versions. Running the full model realistically requires a multi-GPU environment. The transformers framework already helps to simplify deployment with automatic device mapping. The 4-bit model offers a solid compromise between performance and hardware constraints for those on macOS.

If you're already using DeepSeek, upgrading to the 0324 checkpoint is straightforward and gives better results, especially in structured prompts and completion accuracy.

© 2025 ApX Machine Learning. All rights reserved.

LangML Suite

Coming Soon
  • Priority access to high-performance cloud LLM infrastructure
  • Be among the first to optimize RAG workflows at scale
  • Early access to an advanced fine-tuning suite
Learn More
;