
AMD Unveils Instella-VL-1B Vision Language Model
AMD has announced its first vision language model, Instella-VL-1B, which is trained on AMD's Instinct MI300X GPUs. This model is part of the Instella family of language models introduced by AMD in March 2025. Instella-VL-1B is a multi-modal model featuring 1.5 billion parameters, combining a vision encoder with 300 million parameters and a language model with 1.2 billion parameters.
The model was developed using datasets such as LLaVA, Cambrian, and Pixmo, and was further enhanced with document-related datasets like M-Paper and DocStruct4M. With a new pre-training dataset of 7 million examples and a supervised fine-tuning dataset of 6 million examples, Instella-VL-1B outperforms similarly sized open-source models on general visual language tasks and OCR-related benchmarks.
AMD has made Instella-VL-1B fully open-source, sharing not only the model weights but also detailed training configurations, datasets, and code. This initiative underscores AMD's commitment to advancing open-source AI technology in the field of multimodal AI.
We hope you enjoyed this article.
Consider subscribing to one of several newsletters we publish like Silicon Brief.
Also, consider following our LinkedIn page AI Chips & Datacenters.
More from: Chips & Data Centers
Subscribe to Daily AI Brief
Daily report covering major AI developments and industry news, with both top stories and complete market updates