Raindrop Workshop Agent Debugging Guide

TL;DR

TL;DR: Raindrop Workshop is a local debugger for coding agents: install daemon plus UI, instrument your agent with /instrument-agent, then watch tokens, tool calls, spans, and replay loops stream into http://localhost:5899.

Source and Accuracy Notes

This article uses public material from raindrop-ai/workshop, project install docs, and linked local debugging instructions in repository documentation. Commands below are limited to documented Workshop commands and examples.

Workshop is young agent tooling. Its value depends on integration with supported SDKs, providers, and coding agents. Verify behavior inside your own repo before making it part of required team workflow.

What Is workshop?

Raindrop Workshop is a local debugger for AI agents. Instead of treating agent execution as opaque chat transcript, Workshop streams operational traces: every token, tool call, span, and decision as execution happens. The project pitch is direct: watch agent behavior locally at moment it occurs, then let a coding agent read traces, write evals against your codebase, and fix failures.

That moves agent debugging closer to normal software debugging. You do not only see final assistant answer; you inspect trace shape, tool ordering, latency, failed assumptions, and replay behavior. For teams building agents, this distinction matters. Most agent bugs hide in orchestration: wrong tool selected, stale context, malformed intermediate result, poor retry logic, or brittle prompts. A local trace debugger gives those failures a place to become inspectable.

Workshop is TypeScript-oriented as repository implementation, but compatibility list is broader. Project material names TypeScript, Python, Go, and Rust languages. SDK integrations include Vercel AI SDK, OpenAI Agents SDK, Anthropic SDK, Claude Agent SDK, LangChain, LangGraph, CrewAI, Mastra, Pydantic AI, DSPy, Google ADK, Strands, Agno, and Deep Agents. Provider list includes AWS Bedrock, Azure OpenAI, and Vertex AI. That breadth suggests Workshop aims to be instrumentation layer rather than framework replacement.

Repo-Specific Setup Workflow

Step 1: Install local Workshop

Normal install path:

curl -fsSL https://raindrop.sh/install | bash

This installs command-line entrypoint used to start local Workshop daemon and UI.

Step 2: Build from source when changing Workshop itself

Source workflow:

git clone https://github.com/raindrop-ai/workshop.git
cd workshop
bun install
bun run dev

bun run dev starts local Workshop daemon plus Vite UI. Open local UI after startup:

open http://localhost:5899

Use source mode only if you need to inspect implementation, patch UI behavior, or debug Workshop internals.

Step 3: Instrument agent from target repository

Open coding agent in repository you want to debug and run:

/instrument-agent

This is important: instrumentation happens where target agent code lives, not inside Workshop repository. After instrumentation, traces stream into local UI as agent runs.

Step 4: Start and inspect local trace UI

Workshop default local UI is:

http://localhost:5899

Use it while agent runs. Watch token stream, tool calls, spans, and timing. For debugging, resist urge to jump straight to final response. Trace sequence often explains failure earlier than final output.

Step 5: Explore replay workflow

Project material describes local replay through slash command:

/setup-agent-replay

That scaffolds HTTP endpoint for replaying production trace against real agent code. This is where Workshop becomes more than visual logger. Replay lets a captured failure turn into repeatable local test input.

Step 6: Keep tracing local during first evaluation

Workshop is local-first. Start with one developer machine, one agent repo, and non-sensitive traces. Add team sharing only after you understand what trace payloads include.

Deeper Analysis

Workshop addresses a real gap in agent development: observability at reasoning-and-tool boundary. Traditional logs catch application events. LLM provider dashboards catch API calls. Neither fully explains how agent chose tools, how context changed, how spans nested, or whether tokens hinted at upcoming failure. Workshop focuses on that missing layer.

The self-healing eval loop is ambitious. Project material describes Claude writing evals, running agent, seeing failure, fixing code, and rerunning until assertions pass. In practice, treat this as workflow pattern rather than guarantee. Strong evals still require scoped assertions, deterministic fixtures, stable replay inputs, and human review. Workshop can make loop visible and faster; it cannot make poor eval design safe.

Local replay is especially valuable for production agent teams. Production traces often contain exact shape of failure: user message, tool responses, model output, timing, and state. If /setup-agent-replay turns that into local endpoint against real code, teams can reproduce bugs without manually reconstructing entire conversation.

Compatibility breadth is strength and risk. Supporting many SDKs means Workshop can meet teams where they are. It also means instrumentation quality may vary by stack. During evaluation, test your actual framework and provider combination. Do not assume behavior from Vercel AI SDK carries to LangGraph or OpenAI Agents SDK.

Practical Evaluation Checklist

Install Workshop locally and verify UI opens at http://localhost:5899.
Instrument one non-critical agent repo with /instrument-agent.
Run agent task that uses at least one tool call and one model call.
Confirm trace includes tokens, tool calls, spans, and enough metadata for debugging.
Capture one known failure and test whether replay path helps reproduce it.
Check whether traces expose secrets, prompts, file paths, or customer data.
Compare workflow with existing logs, LangSmith, provider logs, or custom telemetry.
Decide whether Workshop is developer-only or part of CI/eval workflow.

Security Notes

Agent traces can contain sensitive data: source code, file paths, prompts, tool outputs, API responses, user messages, and credentials accidentally printed by tools. Treat local trace database and browser UI as sensitive developer data.

Do not instrument production-like agents against real customer data until you know storage path, retention behavior, access model, and redaction options. Local-only does not mean risk-free; malware, shared machines, backups, and screenshots can still leak traces.

The install command fetches remote script. Review installer source or use controlled environment if your organization requires supply-chain validation.

If replay endpoint is scaffolded, protect it. Replay endpoints can execute real agent code with captured inputs. Keep it bound locally unless you have explicit authentication and network controls.

FAQ

Q: Is Workshop an agent framework? A: No. It is debugger and tracing layer for agents built with existing SDKs and providers.

Q: Which port does Workshop use? A: Source dev workflow opens local UI on http://localhost:5899 after daemon and Vite UI start.

Q: Where do I run /instrument-agent? A: Run it inside your coding agent session in target repository you want to trace.

Q: Does Workshop only work with TypeScript? A: Repository is TypeScript, but project compatibility list includes TypeScript, Python, Go, and Rust plus many agent SDKs.

Q: Why use Workshop if I already log model calls? A: Model-call logs show requests and responses. Workshop aims to show live tokens, tool calls, spans, decisions, and replayable failures.

Conclusion

Raindrop Workshop is compelling because it makes agent execution visible where failures actually happen: between model output, tool invocation, span structure, and codebase-specific behavior. Its setup is small, but evaluation should be serious. Install locally, instrument one real repo, inspect trace quality, and try replay on a known failure.

For agent builders, best use is not passive watching. Use Workshop to turn vague failures into trace-backed evals. If those evals become repeatable and actionable, Workshop earns place in your debugging stack. If traces are too noisy or too sensitive, keep it experimental until controls improve.