DeepSeek has launched Janus-Pro, an upgraded version of its multimodal AI model, Janus.
This new version improves training techniques, expands data scaling, and increases model size, leading to better text-to-image generation and multimodal understanding.
Key Improvements in Janus-Pro

Janus-Pro is designed to handle both visual understanding and image generation separately.
This solves previous stability and performance issues, making the model more efficient.
It also uses synthetic aesthetic data to improve the quality of text-to-image outputs.
Despite separating visual tasks, Janus-Pro still operates under a single transformer architecture, ensuring a smooth and unified process.
This setup helps avoid conflicts in the visual encoder’s roles and makes the model more flexible while performing on par with specialized models.
Performance and Evaluation
/filters:no_upscale()/news/2025/01/deepseek-ai-janus/en/resources/5unnamed-1738269327272.png)
Janus-Pro delivers strong results in both multimodal understanding and visual generation.
Its performance is measured using various benchmarks:
- Multimodal Understanding: Evaluated based on POPE, MME-Perception (scaled), GQA, and MMMU.
- Visual Generation: Assessed using GenEval and DPG-Bench.
Compared to previous unified multimodal models and even some specialized models, Janus-Pro achieves higher accuracy and better image generation quality.
Technical Specifications

The model is built on DeepSeek-LLM-1.5B and DeepSeek-LLM-7B frameworks. The larger 7B version performs exceptionally well on MMBench and GenEval.
It uses SigLIP-L as its vision encoder and can process images up to 384×384 in resolution.
The image generation process is powered by a tokenizer with a 16x downsampling rate.
Outperforming the Competition

Janus-Pro-7B has been directly compared to OpenAI’s DALL-E 3 in text-to-image benchmarks.
According to DeepSeek, Janus-Pro-7B outperforms DALL-E 3 on GenEval and DPG-Bench.
The improved training methods, higher-quality data, and larger model size contribute to more stable and detailed images.
Industry Reactions
The release of DeepSeek Janus has generated significant buzz and comments, Vedang Vatsa FRSA shared:
DeepSeek’s Janus-Pro-7B is here. Outperforms DALL-E 3 & Stable Diffusion on GenEval/DPG-Bench. Separates understanding/generation, scales data/models for stable image gen. Unified, flexible, cost-efficient. Open-source win!
AI expert Huzaifa Shoukat posted:
DeepSeek’s new Janus Pro model is impressive. It’s a multimodal LLM that understands images and generates them too. The 1B model runs in the browser using WebGPU via Transformers.js.
Availability and Licensing

Janus-Pro is open-source and available on GitHub under the MIT License.
However, model usage is subject to the DeepSeek Model License. Users can find setup instructions in the repository.
DeepSeek continues to push the boundaries of AI with Janus-Pro, offering a powerful and flexible tool for multimodal applications.
Whether you’re working with text, images, or both, this model brings new possibilities to AI-driven content creation.