Cost per million tokens
DeepSeek V3 represents a major advancement in open-source AI models, offering enhanced capabilities and performance. DeepSeek V3 is an open-source 671B parameter Mixture-of-Experts (MoE) model with 37B activated parameters per token. It features innovative load balancing and multi-token prediction, trained on 14.8T tokens. The model achieves state-of-the-art performance across benchmarks. It incorporates reasoning capabilities distilled from DeepSeek-R1 and supports a 128K context window.
Speed Improvement: DeepSeek V3 processes 60 tokens per second, representing a 3x speed increase over its predecessor
Enhanced Capabilities: The model demonstrates improved overall performance across various tasks
Architecture: Built on a 671B Mixture-of-Experts (MoE) parameter architecture, with 37B activated parameters
Training Scale: Trained on 14.8 trillion high-quality tokens
API Compatibility: Maintains compatibility with previous versions for seamless transition
Open Source: Both the model and associated research papers are freely available to the community.