Prompt Notes - Issue #3

March 22, 2026

Click on a heading to read the article.

Big Story

From Cloud APIs to Agent PCs: Why GTC 2026 is the End of the "Token Tax"

NVIDIA's GTC 2026 marked a pivotal shift: the tools to build production-grade AI systems—RAG pipelines, autonomous agents, and custom fine-tuned models—are now available entirely on local RTX hardware.

Cost & Privacy Sovereignty: With the release of NemoClaw and the Nemotron 3 family (specifically the 120B Super and 4B Nano), you can now build production-grade RAG systems and autonomous agents that run entirely on local RTX hardware. This removes the "token tax" and allows for the processing of sensitive data without it ever leaving the device.
Infrastructure Optimization: The introduction of NVFP4 and FP8 distilled models means you can achieve a 2.1x performance boost on local video and image generation. This allows for high-throughput creative workflows on a single workstation that previously required data-center-grade clusters.
Rapid Fine-Tuning: The launch of Unsloth Studio provides a graph-based UI that simplifies the fine-tuning of over 500 models. It reduces VRAM usage by 70%, enabling you to customize high-parameter models on consumer GPUs—essential for building niche assistants like a "Podcast Researcher."

Beyond Wireframes: How Google Stitch is Automating the Design-to-Code Pipeline

Google Stitch redefines how engineers think about UI generation—moving beyond static mockups to a system where design rules are expressed as agent-readable markdown and entire user journeys are auto-generated.

Design-as-Code (DESIGN.md): Google introduced a markdown-based design system (DESIGN.md). This allows you to import/export design rules as agent-friendly files, making it easier to programmatically apply consistent UI across different projects or AI-generated applications.
Automated User Journeys: Stitch can "stitch" screens together and predict logical next steps. For engineers building automated workflows, this provides a visual debugging layer where you can preview interactive app flows generated by AI in real-time.
Seamless Integration (MCP & SDK): With the new Stitch MCP (Model Context Protocol) server, you can leverage Stitch's design capabilities directly within your own developer tools and skills. This enables a more synchronized workflow between the AI, the design agent, and your deployment environment (like AI Studio).

Agent Framework Updates

LangGraph v1.1.3

Runtime Transparency: A new feature adds detailed execution info to the runtime, giving you deeper visibility into how nodes are processing in real-time.
Stable Persistence: Includes a major update to checkpoint-postgres (v3.0.5), strengthening state management for long-running, multi-turn agent sessions.

OpenAI Agents v0.12.5

Asynchronous Tooling: Native support for non-blocking tool calls enables agents to handle multiple I/O tasks simultaneously without stalling.
Reliable Serialization: Optimized message handling ensures complex conversation histories are passed to the API with lower overhead and fewer errors.
Infrastructure Parity: Updated internal dependencies align the framework with the latest OpenAI API features for better production stability.

Interesting Reads

OpenAI is planning a desktop 'superapp'

The Browser as an Agent: Project Atlas isn't just a Chrome competitor; it's a browser built specifically for autonomous agents to navigate, click, and interact with the web directly, bypassing the need for separate "browser-use" libraries.
Deep OS Integration: By moving into the browser space, OpenAI is gaining direct access to the "runtime" of the internet. This allows for low-latency tool use where the agent doesn't just read a page, but maintains a persistent state across multiple tabs and apps.
The "Superapp" Ecosystem: This suggests a future where AI engineers won't just build "GPTs," but will develop web-native skills that run inside the Atlas environment, similar to how developers build for the Chrome Web Store today.
Identity & Auth: Atlas aims to solve the "login" problem for agents, providing a secure way for AI to handle authentication on behalf of the user, a major hurdle in current RAG and agent workflows.

Research & Techniques

New Tools for Surgery Robots: NVIDIA and Hugging Face Release Open-H

The First Shared Dataset: Open-H-Embodiment offers 778 hours of data from real robots. It tracks how they move, what they see, and the force they use.
A "Brain" for Surgery: The GR00T-H model is the first AI that turns pictures and words into actual surgical actions. It is designed to work across many different types of robot arms.
Better Training in Simulation: The new Cosmos-H Simulator creates realistic videos of surgery (like blood and tissue) just from robot commands.
Massive Speed Gains: You can now test 600 robot runs in 40 minutes using the simulator. This used to take two full days of physical testing.

Industry & Applications

High-Power AI Made Easy: NVIDIA Nemotron 3 Super Arrives on Amazon Bedrock

Serverless Power: You can now use Nemotron 3 Super as a fully managed service on AWS. This means you get top-tier AI performance without the headache of setting up infrastructure.
Top Performance for Agents: This model is specially built for AI agents. It is up to 5x faster than older versions and excels at complex tasks like coding, planning, and multi-step reasoning.
Huge Memory for Context: With a 256K token window, the model can "remember" and process massive amounts of data—equivalent to several thick books—in a single conversation.

Developer Tools

LangChain's New Framework for Production Coding Agents

Open SWE formalizes the "Coding Agent" patterns used by companies like Stripe and Coinbase into an open-source framework built on LangGraph and Deep Agents.

Open SWE formalizes the "Coding Agent" patterns used by companies like Stripe and Coinbase into an open-source framework built on LangGraph and Deep Agents.
Plug-and-Play Sandboxing: It supports isolated cloud execution environments (Modal, Daytona, Runloop) out of the box, allowing agents to run shell commands and tests safely without manual intervention.
Deterministic Orchestration: By combining LLM-driven subagents with deterministic middleware, you can ensure critical steps—like opening a PR or injecting mid-run Slack messages—happen reliably every time.
Context Engineering via AGENTS.md: You can now encode team-specific conventions and architectural rules into an AGENTS.md file at the repo root, which the agent automatically injects into its system prompt.
Native Integration: Meets developers where they are by supporting Slack-first invocation, Linear issue comments, and GitHub PR reviews as primary interfaces.