Today in AI engineering.

№ 02 / 03AGENTS & ORCHESTRATION

Scaling Item Knowledge with JD's Oxygen AIIC Platform

JD.com's Oxygen AIIC uses a hybrid LLM/VLM architecture to automate item-knowledge production at scale, achieving 94.2% precision and 82.8% recall across tens of billions of SKUs.

arXiv cs.AIJun 29, 2026

№ 03 / 03AGENTS & ORCHESTRATION

Agent-Native Immune System (ANIS): Architecture for Runtime Defense

The Agent-Native Immune System (ANIS) shifts AI security from static training-time alignment to dynamic, runtime defense, using a six-layer 'Immune Tower' to protect autonomous agents against memory poisoning and tool-chain manipulation.

arXiv cs.AIJun 29, 2026

arXiv cs.AIMLOps & InfrastructureJun 29, 2026

ATOD: Hybrid Distillation for Autonomous Agent Training

ATOD combines on-policy distillation with reinforcement learning using an annealed schedule and turn-level reweighting to train small agent models that outperform their larger teacher models.

arXiv cs.AIJun 29, 2026

№ 02 / 03

The stream — chronological

0 today · 67 this week

DAY 01Today JUN 29 · 202641 SUMMARIES

Scaling Item Knowledge with JD's Oxygen AIIC Platform

JD.com's Oxygen AIIC uses a hybrid LLM/VLM architecture to automate item-knowledge production at scale, achieving 94.2% precision and 82.8% recall across tens of billions of SKUs.

Agent-Native Immune System (ANIS): Architecture for Runtime Defense

ATOD: Hybrid Distillation for Autonomous Agent Training

ATOD combines on-policy distillation with reinforcement learning using an annealed schedule and turn-level reweighting to train small agent models that outperform their larger teacher models.

Reducing LLM Agent Hallucinations with Grounded Iterative Planning

Grounded Iterative Language Planning (GILP) combines LLM-based reasoning with a small, trained transition-predictor backbone to catch and correct hallucinated state changes, significantly improving planning reliability.

arXiv cs.AIRAG & RetrievalJun 29, 2026

Odyssey: A Categorical Framework for Verifiable Foundation Models

Odyssey uses categorical sheaf theory to compose modular 'foundries'—verifiable, truth-preserving architectural components—that allow for structured, queryable, and auditable LLM-based systems.

DysLexLens: Analyzing Dyslexic AI User Experiences via LLMs

DysLexLens is an end-to-end framework that extracts, structures, and validates insights from noisy online forum data to understand how dyslexic learners interact with AI tools.

ToE: Hierarchical Claim Verification Against Adversarial Misinformation

Tree of Evidence (ToE) is a fact-checking framework that uses a reinforcement learning-driven agent to decompose claims into hierarchical argument trees, significantly improving verification accuracy against adversarially poisoned inputs.

Improving Long-Horizon LLM Planning via Symbolic Feedback

This framework enhances LLM planning reliability by using a symbolic verifier to identify errors and provide corrective, interpretable instructions for iterative self-refinement.

AI-ModelNet: A Networked Architecture for Collaborative AI

AI-ModelNet proposes a hierarchical, Internet-inspired architecture to enable interconnection and collaborative reasoning among heterogeneous, domain-specific models, addressing the fragmentation of the current AI landscape.

Personality Prompting in Multi-Agent Teams: Task-Dependent Impact

Personality manipulation in LLM agents significantly alters communication style but only degrades task performance in open-ended or collaborative domains, while remaining largely neutral in structured coding tasks.

The Pragmatic Engineer (Gergely Orosz)Coding Agents & Dev ProductivityJun 29, 2026

Internalizing Future-Aware Planning in LLM Agents

To move LLM agents beyond reactive behavior, this paper introduces a three-stage training paradigm that enables agents to perform grounded 'what-if' simulations and success estimation.

The Shift in Software Engineering: AI Agents and Production Risk

AI agents have fundamentally transformed software development in six months, enabling massive increases in code output. However, this shift risks quality and security when organizations prioritize AI adoption over core engineering rigor, as evidenced by recent high-profile outages.

Ahead of AI (Sebastian Raschka)Agents & OrchestrationJun 29, 2026

Building and Auditing Local Coding Agents

A practical guide to setting up a local coding agent stack using Ollama and open-weight models, emphasizing performance benchmarking, secure auditing of agent harnesses, and the trade-offs of running local vs. proprietary infrastructure.

Interconnects (Nathan Lambert)Models & Frontier LabsJun 29, 2026

The Diversification of the Open Model Ecosystem

The open model landscape is shifting from a few dominant players to a diverse ecosystem of niche, product-focused, and sovereign AI developers, signaling a move toward a long-tail of specialized models.

Interconnects (Nathan Lambert)Models & Frontier LabsJun 29, 2026

GLM-5.2: A New Benchmark for Open-Weight Agentic Coding

GLM-5.2 marks a pivotal shift in the open-weight landscape, offering the first credible, high-performance alternative to frontier closed models like Claude Opus for complex agentic coding tasks.

Latent Space (Newsletter)Agents & OrchestrationJun 29, 2026

Claude Tag: Moving AI from Chat to Team-Based Delegation

Claude Tag shifts LLM interaction from synchronous chat to asynchronous, team-wide delegation within Slack, positioning Claude as a persistent, proactive coworker rather than a standalone tool.

Latent Space (Newsletter)Inference & ServingJun 29, 2026

SpaceX's Neocloud and the Rise of Owned Intelligence

SpaceX is emerging as a massive compute provider with $28B/year in annualized GPU rental deals, while developers increasingly prioritize 'owned intelligence' via open-weight models like GLM-5.2 to gain control over their AI stacks.

Latent Space (Newsletter)Models & Frontier LabsJun 29, 2026

OpenAI's GPT-5.6 Launch: Frontier Models as Managed Assets

OpenAI released the GPT-5.6 family (Sol, Terra, Luna) as a restricted, government-mediated preview, signaling a shift where release governance is now a core component of the model specification.

Latent Space (Newsletter)Agents & OrchestrationJun 29, 2026

The Rise of Meta-Harnesses and Vertical AI Integration

The AI industry is shifting toward 'meta-harnesses'—standardized agent orchestration layers—while frontier labs move toward vertical integration of custom silicon and agent-native UX.

Latent Space (Newsletter)Agents & OrchestrationJun 29, 2026

Internal AI Adoption & The Rise of Agentic Workflows

OpenAI reports massive internal token growth across all departments, signaling that agentic workflows—supported by review loops and persistent infrastructure—are moving from experimental to core production patterns.

Simon Willison's WeblogInference & ServingJun 29, 2026

Porting PyTorch Models to the Browser with Claude Code

By leveraging Claude Code to convert PyTorch models to ONNX, developers can run sophisticated AI features like image inpainting directly in the browser using WebGPU and the CacheStorage API.

Claude Code ChangelogFrameworks & ToolingJun 29, 2026

Claude Code Changelog: Production Reliability & Agentic Control

Recent updates to Claude Code focus on hardening production workflows, improving agentic reliability through stricter permissioning and background task management, and enhancing the developer experience in terminal-based environments.

Claude Code ChangelogFrameworks & ToolingJun 29, 2026

Claude Code Changelog: Production Reliability and Agentic Control

Recent updates to Claude Code focus on hardening agentic workflows through improved background task management, granular permission controls, enhanced MCP reliability, and significant performance optimizations for terminal-based AI development.

Claude Code ChangelogFrameworks & ToolingJun 29, 2026

Claude Code Changelog: Production Reliability & Agentic Control

Recent updates to Claude Code focus on hardening agentic workflows, improving background task management, and refining safety controls for autonomous shell and MCP operations.

Claude Code ChangelogFrameworks & ToolingJun 29, 2026

Claude Code Changelog: System Reliability and Agentic UX

Recent updates to Claude Code focus on hardening background agent reliability, improving TUI responsiveness, and refining safety controls for autonomous operations.

Claude Code ChangelogFrameworks & ToolingJun 29, 2026

Claude Code Changelog: Production Reliability and Agentic Control

Recent updates to Claude Code focus on hardening background agent reliability, refining safety controls for auto-mode, and optimizing terminal performance for professional engineering workflows.

Together AI BlogInference & ServingJun 29, 2026

ParallelKernelBench: Frontier LLMs Struggle with Multi-GPU Kernels

While LLMs excel at single-GPU kernel generation, they currently struggle with multi-GPU tasks where communication bottlenecks and complex rank coordination dominate performance.

Hugging Face BlogInference & ServingJun 29, 2026

Deploying vLLM Endpoints on Hugging Face Jobs

Hugging Face Jobs allows engineers to spin up private, OpenAI-compatible vLLM endpoints on demand using a single command, providing a pay-per-second alternative for testing and experimentation.

Anthropic NewsAgents & OrchestrationJun 29, 2026

Claude Tag: Collaborative Agentic Workflows in Slack

Claude Tag integrates Claude into Slack as a persistent, multiplayer agent capable of autonomous task execution, cross-channel context awareness, and proactive collaboration.

Import AI (Jack Clark)Agents & OrchestrationJun 29, 2026

Agentic Robotics, Large-Scale Infra, and Future Uncertainty

Recent developments in agentic robot self-improvement, large-scale GPU cluster telemetry, and legal data infrastructure highlight the rapid maturation of AI systems, even as experts debate the long-term implications for human autonomy.

TechCrunch — AIMLOps & InfrastructureJun 29, 2026

Real-Time Fluid Monitoring for Data Center Cooling Efficiency

Omen AI is deploying miniaturized spectrometers to monitor coolant chemistry in real-time, preventing bacterial outbreaks and hardware wear that cause costly data center downtime.

IBM TechnologyCoding Agents & Dev ProductivityJun 29, 2026

Optimizing Software Workflows with AI Code Review

AI code review accelerates development by automating static and dynamic analysis, but it requires human oversight to manage context, mitigate false positives, and ensure architectural alignment.

OpenAI NewsEvals & ReliabilityJun 29, 2026

Building Interoperable Standards for Advanced AI Systems

OpenAI is co-founding the Appia Foundation to translate high-level AI safety frameworks into modular, open technical specifications that enable consistent, third-party evaluation across the global AI supply chain.

AI EngineerInference & ServingJun 29, 2026

Prototype Big, Deploy Small: A Framework for Local LLM Adoption

Stop overpaying for frontier models. By using a 'prototype big, deploy small' framework and rigorous capability evals, you can identify 'Sage' (Small and Good Enough) models that provide production-grade performance on-device, saving costs and improving latency.

AI EngineerAI News & TrendsJun 29, 2026

The Future of AI: Shifting from Monolithic Agents to Composition

Justin Schroeder argues that the future of AI lies in 'domain-specific agents'—small, specialized, composable units—rather than monolithic agents, to solve the reliability, cost, and complexity issues inherent in current agentic architectures.

Moving Upstream: Why Product Strategy Beats Prompting

As AI makes coding cheap, the bottleneck has shifted to product discovery. Success now depends on human-centric techniques like story mapping and value-based requirements to ensure you build what is actually worth building.

AI EngineerMLOps & InfrastructureJun 29, 2026

Building Deterministic Infrastructure for Autonomous AI Agents

Reliability in agentic systems is an infrastructure challenge, not a model one. To scale agents, you must build a 'control plane' that separates model reasoning from production execution via validation, policy enforcement, and circuit breakers.

The Agentic AI Engineer: Scaling Agent Development via Loops

To scale agent development, teams must move from manual iteration to an 'Agentic AI Engineer' model: a multi-agent system that automates the entire lifecycle of spec, build, eval, diagnose, and optimize.

The Prompt as a Platform: Agentic Engineering for Distributed Systems

Dominik Tornow argues that software engineering is shifting from general-purpose implementations to bespoke systems synthesized by agents from abstract specifications, using deterministic simulation as the critical feedback loop for design.