
5 days ago
The Best Open Source US Model (Right behind China)
Anthropic has officially closed a $65 billion Series H at a $965 billion valuation, nearly 2.5x its valuation from just 100 days ago. Meanwhile, funding is flowing across the ecosystem: Frameworks AI at $15B, Baseten at $11B, OpenRouter's $113M Series B, and Cognition AI's $1B Series D.
NVIDIA went on an open-source super week with Nemotron 3 Ultra, Cosmos 3, and Nemotron 3.5 ASR. Microsoft dropped 5 new MAI models. Google released Gemma 4 12B, and Anthropic shipped Opus 4.8.
On the benchmarks front, DeepSWE crowns GPT-5.5 as the leader in long-horizon coding tasks, while ITBench shows even frontier models struggle with real-world SRE incidents — Claude Opus 4.7 tops out at just 47%.
Plus: Cloudflare acquires VoidZero to build the future of AI-native edge development, and Google is paying SpaceX $920M/month for compute.
Topics covered: • Anthropic's $65B Series H and path to $1T • Fireworks AI, Baseten, OpenRouter & Cognition funding rounds • Microsoft's 5 new MAI models • NVIDIA's open-source super week (Nemotron, Cosmos 3) • MiniMax M3, Gemma 4 12B, JetBrains Mellum2, Opus 4.8 • DeepSWE benchmark: GPT-5.5 leads long-horizon coding • ITBench: Frontier models under 50% on real SRE tasks • Cloudflare + VoidZero for AI-native edge dev • Google's $920M/month SpaceX compute deal
#AI #Anthropic #NVIDIA #OpenAI #AInews #TechNews #LLM
Funding rounds
Anthropic formally confirmed the closure of its $65 billion Series H funding round at a post-money valuation of $965 billion. This represents a 2.5-fold increase over its $380 billion Series G valuation from February 2026, adding $585 billion in value in approximately 100 days
https://www.anthropic.com/news/series-h
Frameworks AI raising at 15B valuation representing a near fourfold increase from its $4 billion Series C valuation recorded in October 2025
processing 15 trillion tokens daily for major production clients including Cursor, Notion, and Perplexity
https://finance.yahoo.com/sectors/technology/articles/fireworks-ai-eyes-15-billion-174609357.html
Baseten is raising 1B at 11B valuation
annualized revenue, which skyrocketed from $200 million to $600 million over a single quarter
OpenRouter has secured a $113 million Series B funding
OpenRouter has experienced exponential traffic growth, with weekly production throughput expanding fivefold from 5 trillion to 25 trillion tokens over a six-month horizon
Further up the stack: Cognition AI secured a $1 billion Series D round led by Lux Capital and 8VC
https://cognition.ai/blog/series-d
Model Releases
MAI models:
- MAI-Code-1-Flash: A 5-billion active parameter model optimized for ultra-low latency within GitHub Copilot and VS Code.
- MAI-Image-2.5: A high-fidelity image generation model ranking third on global image evaluation arenas, outperforming competing architectures like Nano Banana Pro.
- MAI-Transcribe-1.5: A multi-lingual speech processing engine offering fivefold speed improvements across 43 languages.
- MAI-Voice-2: Natural audio and voice generation across 15 languages, available at a highly competitive price point.
- Web IQ: A search-grounding API engineered to directly compete with Perplexity.
Nvidia has executed an "Open-Source Super Week," positioning itself as a dominant software and model publisher:
- Nemotron 3 Ultra (best US open source open weights model but behind china): A massive 550-billion parameter MoE (55 billion active) designed with a 1-million token context window, optimized specifically for high-throughput, cyclical agent loops. It achieved peak throughput rates of 400 tokens per second on day-zero optimized clusters.
- Cosmos 3: A physical AI world-modeling framework comprising 16-billion Nano and 64-billion Super variants. Built on a Mixture-of-Transformers (MoT) architecture, Cosmos 3 natively binds textual, visual, auditory, and physical kinetic vectors.
- Nemotron 3.5 ASR: A highly compact 0.6-billion parameter streaming speech recognition model pushing sub-100 millisecond latencies across 40 language locales.
https://www.minimax.io/models/text/m3
- MiniMax M3: A 1-million token context model hitting 59.0% on SWE-Bench Pro and 74.2% on MCP Atlas, though noted for high token consumption due to intensive internal self-validation loops.
https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/
- Gemma 4 12B: Google's Apache 2.0 on-device model, which utilizes an encoder-free architecture that projects vision and audio vectors directly into the text-token space, bypassing separate CLIP-style encoders to minimize local memory footprints.
https://www.jetbrains.com/mellum/
- JetBrains Mellum2: A compact 12-billion parameter MoE (2.5 billion active) engineered for ultra-low latency routing and retrieval-augmented generation (RAG) sub-agents within developer IDEs.
Opus 4.8
Benchmarks:
- https://deepswe.d atacurve.ai/blog
- https://venturebeat.com/technology/deepswe-blows-up-the-ai-coding-leaderboard-crowns-gpt-5-5-and-finds-claude-opus-exploiting-a-benchmark-loophole (GPT 5.5 the winner in long horizon tasks)
- a highly complex software engineering benchmark focused on original, long-horizon tasks across five distinct programming languages. Comprising 113 chaotic tasks across 91 live, production-grade repositories, DeepSWE forces agents to generate 5.5 times more code and modify an average of 7 separate files per task compared to standard evaluations. On this challenging leaderboard, GPT-5.5 leads with a score of 70%, establishing a significant 16-percentage-point lead over contemporary alternatives
- I think older benchmarks where models reach ~90% accuracy can be considered saturated. Few percentage points don’t give us any good signal.
https://research.ibm.com/publications/developing-ai-agents-for-it-automation-tasks-with-itbench
ITBench-AA, an evaluation framework focusing on live Kubernetes incident response and Site Reliability Engineering (SRE) operations. Comprising 59 live, containerized SRE incident snapshots, the results are remarkably sobering: every frontier model scored under 50% on successful incident resolution, with Claude Opus 4.7 leading at 47% and GPT-5.5 following closely at 46%.
Edge AI announcements:
- The consolidation of the AI-native developer stack has reached the runtime virtualization layer. Cloudflare recently completed the acquisition of VoidZero, the development group responsible for Vite, Vitest, Rolldown, and Oxc, backing the transaction with a $1 million open-source ecosystem fund. This acquisition is highly strategic; as autonomous agents write an increasing proportion of production software, local development environments, compilation pipelines, and bundlers must be optimized for execution speeds that match agent speeds.
- Cloudflare's goal is to construct a localized, full-stack edge playground. In this sandbox, AI agents can generate, test, bundle (utilizing the highly parallelized, Rust-based Oxc and Rolldown engines), and deploy entire web applications end-to-end within milliseconds. This architecture completely bypasses traditional local machine container bottlenecks, enabling high-velocity agent loops to execute in a fully sandboxed, web-scale edge runtime.
No comments yet. Be the first to say something!