Apple Introduces Manzano for Image Understanding and Generation
Apple Inc. has introduced Manzano, a new AI model designed to perform both image understanding and image generation within a single system. This dual capability addresses a significant technical challenge that has historically limited most open-source models, which often excel in one area but fall short in the other. Manzano aims to match the performance of commercial systems like OpenAI's GPT-4o and Google's Nano Banana.
Manzano, named after the Spanish word for "apple tree," has not been released publicly and does not yet have a demo. However, Apple researchers have published a paper showcasing low-resolution image samples generated from complex prompts. These results are compared against outputs from open-source models such as Deepseek Janus Pro and commercial systems like GPT-4o and Gemini 2.5 Flash Image Generation.
The core issue Apple identifies lies in how models process images. Understanding images works best with continuous data streams, while image generation requires discrete tokens. Manzano employs a hybrid image tokenizer, using a shared image encoder to generate two types of tokens: continuous tokens for comprehension and discrete tokens for generation. This minimizes the mismatch between tasks.
The model's architecture consists of three key components: the hybrid tokenizer, a unified language model, and a dedicated image decoder. Apple developed three versions of the decoder with 0.9 billion, 1.75 billion, and 3.52 billion parameters, supporting image resolutions from 256 to 2048 pixels. Training occurred in three stages using 2.3 billion image-to-text pairs from public and internal sources, along with one billion internal text-to-image pairs. The total training data amounted to 1.6 trillion tokens, including synthetic data from systems like DALL-E 3 and ShareGPT-4o.
On benchmark tests, Manzano outperformed other models in several key areas. The 30-billion-parameter version achieved top results on ScienceQA, MMMU, and MathVista—tasks that require strong text and diagram comprehension. Performance improved steadily as model size increased from 300 million to 30 billion parameters, with the 3-billion-parameter version scoring over 10 points higher than the smallest version on multiple tasks.
We hope you enjoyed this article.
Consider subscribing to one of our newsletters like Daily AI Brief.
Also, consider following us on social media:
Subscribe to Daily AI Brief
Daily report covering major AI developments and industry news, with both top stories and complete market updates
Market report
AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation
The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.
Read more