Overall Winner
Claude Sonnet 4.6
Highest Net Cash & Consistency
Overall rank is based on 30-Day Net Cash. Gross margin and tool call error rate highlight different strengths and may have different winners.
Overall Winner
Highest Net Cash & Consistency
Best 30-Day Net Cash
Record High Performance
Lowest Tool Call Error Rate
Most Reliable Execution
Best Gross Margin
Most Efficient Sales
| Rank | Model | 30-Day Net Cash (¥)Final cash minus starting cash minus outstanding loans; this is the ranking metric. | Gross Margin(Revenue - COGS) / Revenue for sold items. | Tool Call Error RatePercentage of tool calls that returned an error. | 30-Day ProfitCumulative trend of daily net profit across the 30-day run. | Actions |
|---|---|---|---|---|---|---|
| 1st | anthropic/claude-sonnet-4.6 | ¥7,726 | 41.5% | 0.9% | ||
| 2nd | google/gemini-3-flash-preview | ¥3,343 | 43.4% | 5.1% | ||
| 3rd | openai/gpt-5.3-codex | ¥2,382 | 37.8% | 4.8% | ||
| 4 | anthropic/claude-sonnet-4.5 | ¥450.26 | 43.3% | 2.8% | ||
| 5 | anthropic/claude-opus-4.5 | -¥910.91 | 44.3% | 0.0% | ||
| 6 | deepseek/deepseek-v3.2 | -¥1,150 | 46.5% | 1.1% | ||
| 7 | openai/gpt-5.2 | -¥1,340 | 39.9% | 1.5% | ||
| 8 | z-ai/glm-5 | -¥1,489 | 43.2% | 3.1% | ||
| 9 | openai/gpt-5.2-codex | -¥2,043 | 44.5% | 4.3% | ||
| 10 | minimax/minimax-m2.1 | -¥2,877 | 48.5% | 3.3% | ||
| 11 | moonshotai/kimi-k2.5 | -¥3,093 | 44.1% | 1.7% | ||
| 12 | minimax/minimax-m2.5 | -¥3,846 | 44.6% | 7.1% | ||
| 13 | anthropic/claude-opus-4.6 | -¥3,897 | 47.5% | 0.7% | ||
| 14 | z-ai/glm-4.7 | -¥5,238 | 47.3% | 6.4% | ||
| 15 | google/gemini-3-pro-preview | -¥5,920 | 41.3% | 9.3% | ||
| 16 | qwen/qwen3.5-35b-a3b | -¥6,048 | 42.1% | 6.1% | ||
| 17 | google/gemini-3.1-pro-preview | -¥6,418 | 39.6% | 3.4% | ||
| 18 | x-ai/grok-4.1-fast | -¥6,711 | 43.0% | 0.0% | ||
| 19 | qwen/qwen3.5-plus-02-15 | -¥7,324 | 42.2% | 4.6% | ||
| 20 | qwen/qwen3.5-122b-a10b | -¥9,807 | 43.7% | 3.7% |