AI Weekly Report: Agent Memory, Cerebras IPO, OpenAI DeployCo

Analyst: Caroline
Quantum Bit Intelligence | WeChat: AI123All

This week, the AI industry released several key signals:

Memory Management and Strategic Shifts

Memory management is widely seen as the core bottleneck for large-scale agent deployment, leading to different low-cost long-context memory solutions. Leading companies have begun to establish 'AI as Operating System' as a new strategic direction, evidenced by Googlebook and Gemini Intelligence. Vertical integration is also becoming a crucial competitive advantage. This week, OpenAI completed a major reorganization, merging ChatGPT, Codex, and API product lines under Greg Brockman to build an integrated model-to-application closed loop.

Another key signal is the founding of OpenAI DeployCo. With an initial investment of $4 billion and 150 embedded engineers, OpenAI signals that simply providing APIs is insufficient; helping customers run production workflows is the next frontier. The gap between model capability and deployment capability will directly determine future market share for large model companies.

On the policy front, China has made AI labeling mandatory for short-form videos for the first time, raising compliance thresholds that all AI content platforms must heed.

Overall, the competitive focus has shifted from model capability ceilings to effective intelligence per unit cost. Lower token consumption, lighter deployment, and tighter engineering integration will dictate customer choices.

Infrastructure

Cerebras IPO 20x Oversubscribed, Soars 68% on First Day

Cerebras Systems listed on Nasdaq at $185 per share, surged to $350 at open, and closed at $311.07, a first-day gain of 68%. The fundraising reached $5.55 billion, the largest global IPO since 2026. Institutional investor orders exceeded 20 times the shares offered, and market cap briefly surpassed $100 billion intraday. In 2025, Cerebras revenue was $510 million (up 76% YoY), with net profit of $87.9 million. Pre-IPO, the company faced high client concentration (G42 contributed 87% of 2024 revenue). Through partnerships with OpenAI and AWS, that share dropped to 24% in just six months. OpenAI remains the most important revenue source for the next few years, with a potential cooperation agreement worth over $20 billion for AI computing power from 2026 to 2028. AWS also announced integration of CS-3 chips into Bedrock just before the IPO. Cerebras's IPO is a vote of confidence in the AI infrastructure sector.

Models

Tencent Open-Sources Agent Memory, Reducing Token Consumption by Up to 61%

Tencent Cloud open-sourced TencentDB Agent Memory, offering short-term memory compression and long-term personalized memory for long-horizon agent tasks. It uses a dual mechanism of 'context offloading' and structured task canvas, offloading non-real-time information to external storage while keeping core states. Tests show token consumption reduced by up to 61% in multi-task sessions, task success rate up 52% in web search scenarios, and completion rates improved by 10% and accuracy by 8% in code repair and long-document analysis. This open-source move provides important technical validation for commercial agent applications, with token efficiency being a key signal for commercialization in H2 2026.

MiniCPM-V 4.6 Open-Source: 1.3B Parameters Runs on 6GB Memory

Face Intelligence open-sourced MiniCPM-V 4.6, a multimodal model with only 1.3B parameters, surpassing Qwen3.5-0.8B on benchmarks and ranking first among same-size models. It requires only 6GB of memory for smooth on-device inference, with 1.5x throughput and 1/43 the compute cost of competitors. Using LLaVA-UHD v4, image encoding computation is cut by 50%. It supports iOS, Android, and HarmonyOS, expanding community adoption and enabling on-device multimodal AI.

Ant Ring-2.6-1T Open-Sourced and on OpenRouter with Discounts Until End of May

Ant Baichuan open-sourced its trillion-parameter thinking model Ring-2.6-1T. Weights went live on Hugging Face and ModelScope on May 15. With ~630B activated parameters, it features a tunable Reasoning Effort mechanism (high, xhigh) for on-demand reasoning. Over the past month, Ant has released multiple models focusing on token efficiency.

NVIDIA Open-Sources 2.6B Parameter World Model SANA-WM: 1-Minute 720p Video

NVIDIA NVlabs open-sourced SANA-WM, a lightweight 2.6B parameter world model that generates 720p controllable long videos from a single image and camera trajectory on one GPU. Key innovations: hybrid linear Diffusion Transformer, dual-branch camera control, and two-stage generation pipeline using a 17B refinement model. Training used only 213K public video clips on 64 H100s over 15 days. This shows world models can achieve good visual quality and control with far fewer parameters.

Thinking Machines Lab Releases Interactive Multimodal Model

Thinking Machines Lab released TML-Interaction-Small, a 276B MoE model (12B active) designed for real-time multimodal human-machine collaboration. It can generate responses while the user is speaking, processing audio, video, and text in 200ms units. The system consists of a surface interaction model and a background model for deep reasoning.

Nous' Token Superposition Training Speeds Up Pre-Training 2-3x Without Architecture Changes

Nous Research proposed Token Superposition Training (TST), which boosts LLM pre-training speed by ~2.5x without changing architecture, tokenizer, optimizer, or parallelism. It merges multiple tokens into a set using multi-hot cross-entropy in the first phase, then reverts to standard training. Validated from 270M to 10B parameters, it improves data utilization efficiency.

Applications

DeepSeek Adds Chat History Search in Gray Test

DeepSeek introduced chat history search in App version 2.1.0 (213) as a gray test. Users can search keywords to locate specific conversations. Web version also supports this. While ChatGPT and Claude already had this feature, it fills a gap for heavy users of DeepSeek.

Kimi WebBridge Extension: Agents Operate Browser Directly

Kimi launched WebBridge, a browser extension that allows agents to perform human-like actions (search, scroll, click, input) on real web pages, carrying user login states and cookies. It works with Claude Code, Cursor, Codex, Hermes, etc., running silently in background. It supports information gathering, form filling, cross-site data consolidation, and can encapsulate fixed workflows into CLI tools. Meanwhile, Kimi K2.6 topped the Finance Agent Benchmark V2 with 44.87% accuracy, demonstrating agent task execution capability.

Alibaba Qoder 1.0: IDE Upgraded to Autonomous Workbench

Alibaba Cloud released Qoder 1.0, upgrading from an AI coding assistant to an agent-based autonomous development workbench. Developers define requirements in a 'Quest independent window', and the Agent Harness handles breakdown, coding, testing, and delivery, with cross-project parallel tasks. It introduces structured task runtime and knowledge engineering, team-level knowledge engine, improving code retention by 11% and reducing token consumption by 40%. Qoder now serves over 5 million users globally with ARR exceeding $60 million.

OpenAI Launches DeployCo and Daybreak for Enterprise AI Production

OpenAI announced OpenAI Deployment Company with an initial investment of over $4 billion at a $10 billion valuation. DeployCo deploys ~150 engineers inside client organizations to integrate AI models with data, tools, permissions, and workflows. OpenAI also launched Daybreak, a software security defense program, competing with Anthropic's Glasswing. It combines OpenAI models, Codex security agents, and partners like Intel, Cisco, CrowdStrike. Citing Cisco's 2025 AI Readiness Index (only 13% of enterprises fully ready), these initiatives aim to bridge the gap between demo and production.