Gemini 3.5 Flash Falls Short in Android Coding Benchmarks

Google’s latest AI model, Gemini 3.5 Flash, has been evaluated in the company’s recent Android Bench rankings, revealing unexpected results in Android development performance and cost efficiency.

In the evolving landscape of AI-assisted software engineering, companies are focusing on models that excel in coding tasks. Google’s Android Bench assesses various AI models based on their ability to handle Android development challenges, assigning scores out of 100 to indicate the percentage of coding cases successfully solved across multiple runs.

Contrary to expectations, Gemini 3.5 Flash secured the sixth position in the latest rankings, trailing behind models such as GPT 5.5 and Gemini 3.1 Pro Preview. This outcome is notable given that Gemini 3.5 Flash was promoted as a more cost-effective and faster alternative to its predecessor, Gemini 3.1 Pro. However, the benchmarks indicate a 9% lower performance success rate and increased latency for Gemini 3.5 Flash compared to Gemini 3.1 Pro Preview.

Cost analysis further highlights the disparities. A single benchmark run with Gemini 3.5 Flash consumes an average of 355.9 tokens, costing $147.1. In contrast, Gemini 3.1 Pro Preview utilizes 73.3 tokens per run at approximately one-third of the cost. This significant difference raises questions about the efficiency and value proposition of the newer model.

For context, the top-performing models in the latest Android Bench rankings are as follows:

Model Score Average Latency Average Total Tokens Average Cost
GPT 5.5 74 15.7 64.7 $134.2
GPT 5.4 72.4 21.2 64.2 $91.7
Gemini 3.1 Pro Preview 72.4 11.1 73.3 $47.9
Claude Opus 4.7 68.7 11.6 90.0 $124.3
Claude Opus 4.6 66.6 9.9 69.5 $84.4
Gemini 3.5 Flash 63.7 14.2 355.9 $147.1

These findings suggest that while Gemini 3.5 Flash may offer advantages in other applications, its performance and cost-effectiveness in Android development are currently suboptimal. Developers should consider these factors when selecting AI models for their projects, balancing performance metrics against operational costs to achieve optimal outcomes.