xAGI AI Briefing: August 27, 2025
Nano-banana Gemini 2.5 Flash Image Preview, Open Source reasoning models and the battle for AI supremacy
Google DeepMind has established a new benchmark for image generation and editing with the unveiling of Gemini 2.5 Flash Image Preview. This model's commanding lead on the LMArena leaderboard, backed by millions of votes, underscores its state-of-the-art performance and its immediate integration into major creative industry platforms. Its core innovation lies in a “thinking before drawing” approach, which applies a reasoning pass to prompts to generate more accurate and coherent outputs.
In parallel, the open-source community is moving beyond a pure scale race. Nous Research's Hermes 4 and NVIDIA's Nemotron Nano 9B V2 introduce sophisticated reasoning mechanisms and hybrid architectures to solve critical bottlenecks. Nemotron Nano, a hybrid Mamba-Transformer, directly tackles long-context throughput with a novel architecture, while InternVL3.5 introduces a suite of innovations to boost multimodal efficiency. These models signal a maturation in design, where the focus is on practical utility and performance under real-world constraints, not just raw size.
The democratization of these powerful models is also accelerating, as evidenced by Ollama's v0.11.7 update. By adding support for models like DeepSeek v3.1 and introducing a "Turbo mode," Ollama is pioneering a new distribution model that blends local-first access with remote, high-performance compute. This approach addresses the paradox of running massive models on consumer-grade hardware, making frontier AI more accessible to a wider audience.
Collectively, today's developments illustrate an industry-wide push toward more deliberate, efficient, and user-centric AI systems. The focus has shifted from mere generative capacity to the fundamental challenges of reasoning, efficiency, and broad accessibility.
1. The New Frontier in Image Generation: Gemini 2.5 Flash Image Preview
On August 26, 2025, Google DeepMind unveiled Gemini 2.5 Flash Image, a state-of-the-art image generation and editing model internally referred to as "nano-banana".1 This new model is an upgrade to the existing Gemini family, offering a suite of advanced capabilities designed for both creative and enterprise use cases. The update became available immediately to developers through the Gemini API and Google AI Studio, as well as for enterprise customers via Vertex AI.1
1.1 Core Capabilities and Innovations: Beyond the Basics
Gemini 2.5 Flash Image is not merely another image model; it introduces a foundational shift in how it processes and understands visual prompts. Its core capabilities are built on four pillars:
Character Consistency: A long-standing challenge in image generation is maintaining the appearance of a character or product across multiple generations and edits. The model is capable of preserving a subject's identity while placing it in different environments or showcasing it from multiple angles, a crucial feature for applications like consistent branding or visual storytelling.1
Natural Language Edits: The model allows for targeted transformations and precise local edits using simple, conversational language. A user can, for example, blur a background, remove an object, or change a subject's pose with a single, straightforward prompt. This functionality is demonstrated through a photo editing template app in Google AI Studio.1
Multi-Image Fusion: A particularly powerful feature is the model's ability to understand and merge multiple input images into a single, seamless visual. This enables new creative workflows, such as placing an object into a new scene or restyling a room based on a texture from another image.1
Native World Knowledge: Many image models have excelled at aesthetics but lacked a deep, semantic understanding of the real world. Gemini 2.5 Flash Image benefits from the expansive knowledge of the broader Gemini family, which allows it to handle more complex, real-world tasks like interpreting and editing hand-drawn diagrams or following intricate, multi-step instructions.1
A key architectural distinction of this model is its "thinking before drawing" paradigm.4 Instead of directly generating an image from a text prompt, it applies a prior reasoning pass. This internal deliberation process, a hallmark of the Gemini family of models, is designed to reduce nonsensical outputs and improve overall accuracy and semantic understanding. This design choice aligns with the broader industry trend of integrating explicit reasoning mechanisms to enhance model reliability.
The model is competitively priced for developers, with a rate of $30.00 per 1 million output tokens. Given that each generated image is 1,290 output tokens, the cost per image is a clear and predictable $0.039.1
1.2 Competitive Analysis: Dominating the Image Edit Arena
The performance of Gemini 2.5 Flash Image is not just a marketing claim but is empirically validated on a prominent crowdsourced leaderboard. The LMArena Image Edit leaderboard serves as a critical, real-world benchmark for evaluating the quality of image models based on user preferences and direct comparisons.5
According to the LMArena leaderboard, Gemini 2.5 Flash Image Preview has achieved a Model Score of 1362 with a staggering total of 2,521,035 votes.7 The sheer volume of votes is a powerful indicator of the model's public adoption and the statistical robustness of its ranking. The user's summary references a "~170-180 Elo lead," which is not a direct data point but a qualitative measure derived from the quantitative gap between Gemini's score and its nearest competitor. The significant margin between Gemini and other leading models, such as Alibaba's Qwen-Image-Edit, is a clear signal of its market dominance.8 Community sentiment on platforms like Reddit echoes this, with users describing the model as "genuinely game-changing" and "in a whole different league" compared to its rivals.9 This confluence of objective data and user feedback strongly suggests that Gemini 2.5 Flash Image has established a new performance benchmark for the industry.
1.3 The Ecosystem: A Foundation for Third-Party Innovation
A key indicator of a model's maturity and real-world utility is its integration into established platforms and services. Google's announcement highlights a foundational strategy to empower third-party innovation by making the model available to key players in the creative, marketing, and design sectors. The following table summarizes some of the most prominent partnerships announced today:
Partner
Core Use Case
Quote or Key Benefit
Adobe
Image generation and editing within Adobe Firefly and Adobe Express.
"Greater flexibility to explore their ideas with industry-leading generative AI models and create stunning content with ease."
Poe (by Quora)
Real-time, conversational image editing applications.
"Notable strengths in maintaining cross-edit coherence" and "low response times... supports deployment in real-time image-based applications."
WPP
Creative and marketing services platform for major clients.
"Powerful use cases across multiple sectors, particularly retail... with its ability to combine multiple products into single frames."
Freepik
AI-powered image generation and editing suite for creatives.
"The model handles complex edits easily, producing results that look polished and professional instantly."
Leonardo.ai
AI image generation platform for creative professionals.
"Extreme flexibility... will enable entirely new workflows and creative possibilities, representing a true step-change in capability."
Figma
AI image tools for designers.
"Enabling designers to generate and refine images using text prompts—creating realistic content that helps communicate design vision."
Source: 3
This widespread and immediate adoption by market leaders demonstrates the model's practical readiness and commercial viability. The partnerships showcase that Gemini's value proposition extends beyond raw performance to its ability to streamline professional workflows and solve specific business problems, from marketing to UI/UX prototyping.
2. The Rise of Reasoning and Efficiency in Open-Source AI
While Google's Gemini 2.5 Flash Image marks a significant milestone in proprietary AI, the open-source community is simultaneously pushing the boundaries of model design with a new emphasis on specialized capabilities and architectural efficiency. Today’s news highlights two pivotal releases that embody this trend.
2.1 Nous Research's Hermes 4: A Hybrid Approach to Thought
Nous Research, a prominent name in the open-source community, has released Hermes 4, a large-scale reasoning model built on the Meta-Llama-3.1-405B base.10 This model introduces a "hybrid reasoning mode" where it can internally deliberate on a problem using explicit
<think>...</think> traces before providing a final answer.10 A user can control this behavior via the chat template, allowing for a strategic balance between fast, direct responses and more thorough, multi-step reasoning.10
Hermes 4 is a direct response to a key community demand: models that offer greater control and less censorship. It is explicitly trained for "steerability" and "lower refusal rates," with the developers introducing a new benchmark, "RefusalBench," to measure a model's willingness to be helpful across a variety of scenarios.11 This focus on user-directed behavior is a critical differentiator from many of the more restrictive proprietary models.12 The model was fine-tuned with a post-training corpus of approximately 60 billion tokens, with a specific emphasis on reasoning traces to enhance its performance in critical domains like math, code, and logical reasoning.10
2.2 NVIDIA's Nemotron Nano 9B V2: The Architectural Vanguard
NVIDIA’s release of Nemotron Nano 9B V2 is a landmark contribution that merges architectural innovation with a strategic community-building effort. This model is a hybrid Mamba-Transformer, a novel architectural choice that directly addresses the performance bottlenecks of long-context reasoning tasks.13
The core innovation lies in its Nemotron-H architecture, which replaces the majority of the computationally intensive self-attention layers of a standard Transformer with efficient Mamba-2 layers.14 This architectural pivot is a deliberate solution to the quadratic complexity of attention mechanisms, which makes processing and generating long sequences of text incredibly costly. By using Mamba-2 layers, which have a linear-time complexity, the model can achieve up to a
6x higher inference throughput on long-context tasks compared to similarly sized models like Alibaba’s Qwen3-8B.13 This performance gain is particularly impactful for tasks that require extensive internal deliberation or the processing of large documents. The model retains a handful of attention layers to maintain strong performance on reasoning benchmarks while achieving unprecedented efficiency.
Furthermore, in a move that signals a profound shift in the open-source landscape, NVIDIA has released the majority of the pretraining data used for the Nemotron models. The Nemotron-Pre-Training-Dataset-v1 is a massive collection of 6.6 trillion tokens, encompassing web crawl, code, math, and multilingual data.13 By releasing both the model and the data, NVIDIA is providing the core ingredients for developers to not only use the model but also to fine-tune it or train their own. This contribution extends beyond a simple product release to position NVIDIA as a central facilitator of the open-source ecosystem, with its hardware becoming the de facto platform for large-scale training.
2.3 InternVL3.5: Multimodal Excellence for Efficiency
InternVL3.5 is a new family of open-source multimodal models (MLLMs) that prioritizes efficiency and versatility. The models are built on high-quality, pre-existing language model backbones, including the Qwen3 series and GPT-OSS.17 The release of this model family is accompanied by three key architectural innovations:
Cascade Reinforcement Learning (Cascade RL): This is a scalable, two-stage framework for enhancing reasoning. An offline stage provides a stable foundation, while an online stage refines the model's alignment, leading to significant gains on complex reasoning benchmarks like MMMU and MathVista.17
Visual Resolution Router (ViR): To address the high computational cost of processing high-resolution images, ViR dynamically selects an optimal resolution for visual tokens. This reduces inference costs with a negligible performance sacrifice, making the model far more efficient in practice.17
Decoupled Vision-Language Deployment (DvD): This strategy allows the vision encoder and the language model to be deployed on separate GPUs. By decoupling these components, the computational load is balanced, and hardware utilization is maximized, leading to a significant boost in overall inference speed.17
These innovations collectively deliver impressive performance gains. The research indicates that InternVL3.5 achieves a +16.0% gain in overall reasoning performance and a 4.05x inference speedup compared to its predecessor, InternVL3.17 Furthermore, its largest model, InternVL3.5-241B-A28B, attains state-of-the-art results among open-source MLLMs and narrows the performance gap with leading commercial models like GPT-5 to just 3.9%.17
The following table provides a clear, side-by-side comparison of the three key open-source models announced today:
Model Name
Organization
Architecture
Core Innovation
Key Performance Claim
Base Model/Backbone
Hermes 4
Nous Research
Hybrid-Reasoning
User-controllable <think> traces for internal deliberation.
Enhanced performance in math, code, and STEM benchmarks.
Meta-Llama-3.1-405B
Nemotron Nano 9B V2
NVIDIA
Hybrid Mamba-Transformer
Replaces attention with Mamba-2 layers for linear scalability.
Up to 6x higher throughput on long-context reasoning tasks.
Nemotron-Nano-12B-v2-Base
InternVL3.5
OpenGVLab
ViT-MLP-LLM
Cascade RL, Visual Resolution Router, Decoupled Vision-Language Deployment.
+16.0% reasoning gain and 4.05x inference speedup.
Qwen3, GPT-OSS
3. The Toolkit Ecosystem: Ollama and the Strategic Access to Models
The distribution and accessibility of AI models are as critical as their performance. Today's update from Ollama, with the release of v0.11.7, highlights a new, pragmatic approach to model delivery that bridges the gap between powerful models and consumer-grade hardware.23
The v0.11.7 update adds support for DeepSeek v3.1, a massive 685B-parameter model with a Mixture-of-Experts (MoE) architecture.26 DeepSeek v3.1 is notable for its "hybrid thinking mode" which, similar to the other models discussed, allows it to generate a detailed chain of thought to arrive at a better answer.25 Its performance on coding benchmarks is particularly impressive, scoring 71.6% on the Aider benchmark and being reported as 68x cheaper than competitors like Claude Opus 4.26
The addition of DeepSeek v3.1 is made possible through Ollama's new "Turbo mode" preview, which presents an evolution in the model distribution paradigm.25 Traditionally, Ollama has focused on running models locally, directly on a user's machine. However, the hardware requirements for a 685B-parameter model with a 37B-parameter active expert count are prohibitive for most consumers.26 Turbo mode addresses this by offloading the computational burden of running these large models to datacenter-grade hardware, allowing users to access them through the Ollama app, CLI, and API without needing a powerful local GPU.25
This development represents a nuanced shift in the pursuit of AI democratization. While it sacrifices the complete local autonomy that defines Ollama's core mission, it achieves a more practical form of accessibility. It recognizes that for many of the most powerful models, the immediate priority is not to make them run locally on any device, but to make them accessible to anyone, regardless of their local hardware constraints. This creates a new, hybrid distribution model—a pragmatic blend of open-source tooling and centralized, cloud-based compute—that is likely to define a new phase of the AI ecosystem.
4. Broader Market and Strategic Implications
The collective announcements from Google, Nous Research, NVIDIA, and Ollama on this day reveal several profound, interconnected trends that are shaping the future of the AI industry.
The first is a convergence on the importance of explicit reasoning. The idea that a model should "think" before it acts—whether for image generation (Gemini), long-context understanding (Nemotron Nano), or general reasoning (Hermes 4, DeepSeek v3.1)—is no longer a niche feature but a foundational design principle across both proprietary and open-source ecosystems. This shared focus suggests that raw output generation is giving way to a more mature and reliable form of AI.
Second, the open-source community is now competing on architectural specialization and efficiency, not just on a race to the largest parameter count. The innovations in Nemotron Nano's Mamba-Transformer architecture and InternVL3.5's decoupled deployment are not incremental improvements but purpose-built solutions to specific, difficult problems like quadratic complexity and computational load balancing. This signals a new stage of maturity where competition is a multi-faceted contest across speed, efficiency, and utility.
Finally, the competitive landscape itself is shifting. The battle for supremacy is now fought on three distinct fronts:
Raw Intelligence and Quality: As exemplified by Gemini's commanding dominance on the LMArena leaderboard.
Architectural Efficiency and Specialization: As demonstrated by the innovative approaches of Nemotron Nano and InternVL3.5.
Ecosystem and Accessibility: As highlighted by NVIDIA's strategic dataset release, which positions its hardware as an ecosystem, and Ollama's hybrid distribution model, which solves the practical challenge of making large models widely accessible.
Curated Resources
Google DeepMind / Gemini 2.5 Flash Image
Official Blog Post: https://developers.googleblog.com/en/introducing-gemini-2-5-flash-image/
Google Cloud Blog: https://cloud.google.com/blog/products/ai-machine-learning/gemini-2-5-flash-image-on-vertex-ai
LMArena Image Edit Leaderboard: https://lmarena.ai/leaderboard/image-edit
Nous Research / Hermes 4
Model Page on OpenRouter: https://openrouter.ai/nousresearch/hermes-4-405b
Hugging Face Model Card: https://huggingface.co/NousResearch/Hermes-4-70B
GitHub Repository: https://github.com/nousresearch
NVIDIA / Nemotron Nano 9B V2
Official Research Page: https://research.nvidia.com/labs/adlr/NVIDIA-Nemotron-Nano-2/
ArXiv Technical Paper: https://arxiv.org/abs/2508.14444
Hugging Face Model Card: https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2
NVIDIA AI Developer X Post:
https://twitter.com/NVIDIAAIDev/status/1944675710214652932
InternVL3.5
ArXiv Technical Paper: https://arxiv.org/abs/2508.18265
YouTube Demo Video:
Hugging Face Model Card: https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B
Ollama v0.11.7
Ollama Blog Post: https://ollama.com/blog/new-app
DeepSeek v3.1 Model Page: https://ollama.com/library/deepseek-v3.1
Ollama Turbo Mode: https://ollama.com/turbo
Hugging Face DeepSeek Model Page: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base/
Works cited
Introducing Gemini 2.5 Flash Image, our state-of-the-art image ..., accessed August 27, 2025, https://developers.googleblog.com/en/introducing-gemini-2-5-flash-image/
Nano banana is here: Google unveils Gemini 2.5 Flash Image upgrade, accessed August 27, 2025, https://economictimes.indiatimes.com/tech/artificial-intelligence/nano-banana-is-here-google-unveils-gemini-2-5-flash-image-upgrade/articleshow/123529187.cms
Gemini 2.5 Flash Image on Vertex AI | Google Cloud Blog, accessed August 27, 2025, https://cloud.google.com/blog/products/ai-machine-learning/gemini-2-5-flash-image-on-vertex-ai
Google Gemini 2.5 Flash-Image: How Google Is Pushing AI Boundaries - DEV Community, accessed August 27, 2025, https://dev.to/alifar/google-gemini-25-flash-image-how-google-is-pushing-ai-boundaries-2dkh
Image Arena Leaderboard - a Hugging Face Space by ..., accessed August 27, 2025, https://huggingface.co/spaces/ArtificialAnalysis/Text-to-Image-Leaderboard
lmarena.ai, accessed August 27, 2025, https://lmarena.ai/leaderboard/image-edit#:~:text=Compare%20models%20based%20on%20their%20ability%20to%20generate%20and%20edit%20images.
Image Edit Arena | LMArena, accessed August 27, 2025, https://lmarena.ai/leaderboard/image-edit
Qwen-Image-Edit Released! : r/LocalLLaMA - Reddit, accessed August 27, 2025, https://www.reddit.com/r/LocalLLaMA/comments/1mttgrf/qwenimageedit_released/
Gemini 2.5 Flash Image Preview releases with a huge lead on image editing on LMArena : r/singularity - Reddit, accessed August 27, 2025, https://www.reddit.com/r/singularity/comments/1n0n3mb/gemini_25_flash_image_preview_releases_with_a/
Hermes 4 405B - API, Providers, Stats | OpenRouter, accessed August 27, 2025, https://openrouter.ai/nousresearch/hermes-4-405b
NousResearch/Hermes-4-70B - Hugging Face, accessed August 27, 2025, https://huggingface.co/NousResearch/Hermes-4-70B
Hermes 4 (70B & 405B) Released by Nous Research : r/SillyTavernAI - Reddit, accessed August 27, 2025, https://www.reddit.com/r/SillyTavernAI/comments/1n0x36k/hermes_4_70b_405b_released_by_nous_research/
Nvidia Enters Small AI Model Race with Nemotron-Nano 9B V2 with Toggleable Reasoning, accessed August 27, 2025, https://winbuzzer.com/2025/08/19/nvidia-enters-small-ai-model-race-with-nemotron-nano-9b-v2-with-toggleable-reasoning-xcxwbn/
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid ... - arXiv, accessed August 27, 2025, https://arxiv.org/abs/2508.14444
Nemotron-Nano-9B-v2: Efficient Long-Context LLM - Emergent Mind, accessed August 27, 2025, https://www.emergentmind.com/topics/nemotron-nano-9b-v2
NVIDIA Nemotron Nano 2 and the Nemotron Pretraining Dataset v1 ..., accessed August 27, 2025, https://research.nvidia.com/labs/adlr/NVIDIA-Nemotron-Nano-2/
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency - arXiv, accessed August 27, 2025, https://arxiv.org/html/2508.18265v1
InternVL3.5: Advancing Open-Source Multimodal Models in ... - arXiv, accessed August 27, 2025, https://arxiv.org/pdf/2508.18265
InternVL3.5: Open Multimodal LLM With Cascade RL - YouTube, accessed August 27, 2025,
LocalAI models, accessed August 27, 2025, https://localai.io/gallery.html
Upload README.md with huggingface_hub · OpenGVLab/InternVL3_5-241B-A28B at efa4002, accessed August 27, 2025, https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B/commit/efa4002569dd3178e6c864756c465b0b03e8e38f
Paper page - InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency - Hugging Face, accessed August 27, 2025, https://huggingface.co/papers/2508.18265
Ollama's new app · Ollama Blog, accessed August 27, 2025, https://ollama.com/blog/new-app
ollama 0.11.7-1 (x86_64) - Arch Linux, accessed August 27, 2025, https://archlinux.org/packages/extra-testing/x86_64/ollama/
deepseek-v3.1 - Ollama, accessed August 27, 2025, https://ollama.com/library/deepseek-v3.1
DeepSeek V3.1 Base : The ChatGPT killer is back | by Mehul Gupta | Data Science in Your Pocket | Aug, 2025 | Medium, accessed August 27, 2025, https://medium.com/data-science-in-your-pocket/deepseek-v3-1-base-the-chatgpt-killer-is-back-1c0f05530677
DeepSeek-V3.1 (Thinking and Non Thinking) : r/LocalLLaMA - Reddit, accessed August 27, 2025, https://www.reddit.com/r/LocalLLaMA/comments/1mw3kmd/deepseekv31_thinking_and_non_thinking/