DeepSeek-R1 Now Live With NVIDIA NIM

DeepSeek-R1, a 671-billion-parameter AI model, is now available as an NVIDIA NIM microservice preview. It offers unparalleled efficiency in logical inference, reasoning, and coding.

Key takeaways

Unrivaled Performance: R1 achieves up to 3,872 tokens per second on NVIDIA HGX H200.
Massive Scale: The model boasts 671 billion parameters and supports a 128,000-token context length.
Enterprise Ready: Available as an NVIDIA NIM microservice, ensuring secure, scalable deployment.
Next-Gen Optimization: Poised for a significant leap with NVIDIA Blackwell architecture and FP4 compute performance.

DeepSeek-R1: The future of AI reasoning and inference

DeepSeek-R1 is more than just an AI model—it’s a glimpse into the future of intelligent automation, setting new standards for performance, security, and scalability.

Designed as a large mixture-of-experts (MoE) model, it incorporates 256 experts per layer and processes each token through eight parallel experts.

This structure enables unmatched logical inference, reasoning, coding, and mathematical problem-solving capabilities.

NVIDIA NIM Microservice: Secure & scalable AI deployments

The DeepSeek-R1 NIM microservice is now available as a preview on build.nvidia.com, allowing developers to experiment securely.

Enterprises can deploy the model on their preferred accelerated computing infrastructure, ensuring maximum security and privacy.

With NVIDIA AI Enterprise and NeMo software, businesses can tailor AI models to their needs.

DeepSeek-R1 on NVIDIA HGX H200

DeepSeek-R1 is designed for extreme efficiency, achieving up to 3,872 tokens per second on a single NVIDIA HGX H200 system.

This makes it one of the fastest inference models available, ideal for real-time applications requiring high-speed computation.

How DeepSeek-R1 leverages NVIDIA Blackwell?

The upcoming NVIDIA Blackwell architecture will push R1 even further.

Equipped with fifth-generation Tensor Cores, Blackwell promises up to 20 petaflops of peak FP4 compute performance.

The 72-GPU NVLink domain is engineered for AI inference, ensuring R1 operates efficiently during reasoning-heavy tasks.

The AI race

Unlike many open-source LLMs, R1 boasts 10x the parameters, enabling more profound understanding and greater precision.

Its 128,000-token context length enables it to handle long-form content and complex prompts more effectively than competitors.

Downloadable NIM Microservice coming soon!

The DeepSeek-R1 NIM microservice is available as a preview but will soon be available for download as part of the NVIDIA AI Enterprise software platform.

This will streamline AI development and empower organizations to customize AI agents easily.

Subscribe to our global Design Journal and stay updated with the daily design news.

AI Agents

Industry

Company

Stay up to date with Our News & Articles