Apple Introduces Manzano for Image Understanding and Generation

October 03, 2025

Apple has unveiled Manzano, a new AI model designed to handle both image understanding and generation, addressing a technical challenge faced by many open-source models.

Apple Inc. has introduced Manzano, a new AI model designed to perform both image understanding and image generation within a single system. This dual capability addresses a significant technical challenge that has historically limited most open-source models, which often excel in one area but fall short in the other. Manzano aims to match the performance of commercial systems like OpenAI's GPT-4o and Google's Nano Banana.

Manzano, named after the Spanish word for "apple tree," has not been released publicly and does not yet have a demo. However, Apple researchers have published a paper showcasing low-resolution image samples generated from complex prompts. These results are compared against outputs from open-source models such as Deepseek Janus Pro and commercial systems like GPT-4o and Gemini 2.5 Flash Image Generation.

The core issue Apple identifies lies in how models process images. Understanding images works best with continuous data streams, while image generation requires discrete tokens. Manzano employs a hybrid image tokenizer, using a shared image encoder to generate two types of tokens: continuous tokens for comprehension and discrete tokens for generation. This minimizes the mismatch between tasks.

The model's architecture consists of three key components: the hybrid tokenizer, a unified language model, and a dedicated image decoder. Apple developed three versions of the decoder with 0.9 billion, 1.75 billion, and 3.52 billion parameters, supporting image resolutions from 256 to 2048 pixels. Training occurred in three stages using 2.3 billion image-to-text pairs from public and internal sources, along with one billion internal text-to-image pairs. The total training data amounted to 1.6 trillion tokens, including synthetic data from systems like DALL-E 3 and ShareGPT-4o.

On benchmark tests, Manzano outperformed other models in several key areas. The 30-billion-parameter version achieved top results on ScienceQA, MMMU, and MathVista—tasks that require strong text and diagram comprehension. Performance improved steadily as model size increased from 300 million to 30 billion parameters, with the 3-billion-parameter version scoring over 10 points higher than the smallest version on multiple tasks.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like Daily AI Brief.

Also, consider following us on social media:

AI Brief AI Brief (X)

Subscribe to Daily AI Brief

Daily report covering major AI developments and industry news, with both top stories and complete market updates

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

ModelOp

The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.

Categories

Companies

Resources

Apple Introduces Manzano for Image Understanding and Generation

We hope you enjoyed this article.

Subscribe to Daily AI Brief

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

You May Also Like

Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking Multimodal AI Model

OpenAI Prepares GPT-5.1 Series with Reasoning and Pro Models

Manus 1.5 Launches with Faster Performance and Full-App Builder

OpenAI Rolls Out GPT-5.1 With Smarter, More Conversational ChatGPT

Memories.ai and Qualcomm Partner on On-Device Visual Memory Model

China's Generative AI User Base Reaches 515 Million by Mid-2025

Baidu Unveils M100 and M300 AI Chips to Strengthen Domestic Computing

OpenAI Reportedly Developing Generative Music Tool

Nomic Refocuses Platform on AI for the Built World

NVIDIA Expands Omniverse Platform to Power U.S. AI Manufacturing and Robotics

Microsoft Releases Open-Source Benchmark for AI Cybersecurity Agents

NeuralTrust Reports First Signs of Self-Fixing AI Behavior