dev-tools 7 min read

agents-best-practices for AI Agent Harnesses

Denis Sergeevitch's agents-best-practices packages provider-neutral guidance for building AI agent harnesses with typed tools, approval gates, context compaction, MCP connectors, and rollout safety.

By
Share: X in
agents-best-practices GitHub tool guide thumbnail

TL;DR

TL;DR: DenisSergeevitch/agents-best-practices is not another prompt pack. It is a skill-sized architecture reference for building safer AI agent harnesses where runtime code, permissions, budgets, and validation stay outside model improvisation.

Source and Accuracy Notes

Last reviewed for this post: 2026-06-10.

What Is agents-best-practices?

This repo packages a provider-neutral Agent Skill for designing and auditing agent harnesses. The key phrase is agent harness, not agent prompt. The author keeps returning to one boundary: the model proposes actions, while harness code validates, authorizes, executes, records, and feeds back observations.

That sounds abstract until you read shipped materials. The skill covers:

  • tool design and risk classes;
  • approval-gated vs autonomous actions;
  • planning and goal loops;
  • context memory and compaction;
  • MCP and external connector governance;
  • observability, evals, and launch gates.

In other words, it is documentation for control plane around an agent, not only personality text inside an agent.

Repo-Specific Setup Workflow

Step 1: Install the skill where your coding agent can discover it

The repository documents three install paths. Fastest is skills-based install:

npx skills add DenisSergeevitch/agents-best-practices -g

Manual Codex install is also documented:

mkdir -p "${CODEX_HOME:-$HOME/.codex}/skills"
git clone https://github.com/DenisSergeevitch/agents-best-practices.git \
  "${CODEX_HOME:-$HOME/.codex}/skills/agents-best-practices"

That puts SKILL.md, references, and icon in place for discovery.

Step 2: Use it for concrete design problems, not vague inspiration

The README gives three strong entry points:

  • generate an MVP agent blueprint;
  • audit a brittle harness;
  • design tools, permissions, and connectors.

That is good scoping. The skill is most valuable when you already know domain and need runtime shape.

Step 3: Start with MVP Builder Mode

SKILL.md explicitly says new agent requests should default to smallest useful production-safe harness. It asks the agent to infer:

  1. domain,
  2. autonomy level,
  3. risk level,
  4. state duration,
  5. tool surface,
  6. validation.

Then it maps that to reference files like mvp-agent-blueprint.md, tools-and-permissions.md, context-memory-compaction.md, and security-evals-observability.md.

Step 4: Apply its non-negotiable rules to your own runtime

Several rules are worth lifting directly into design reviews:

  • model does not execute actions directly;
  • every tool call must get a tool result;
  • risky side effects need runtime policy outside model;
  • draft and commit should split for high-risk actions;
  • long-running goals need budgets and checkpoints.

These are simple rules, but they kill many “demo worked, production failed” agent designs.

Deeper Analysis

This repo is useful because it refuses to confuse prompt engineering with systems engineering. That sounds obvious, yet many agent repos still depend on giant instructions while leaving permissions, state, and approvals implicit.

The architecture reference is especially strong on component boundaries. It lists instruction manager, context builder, model adapter, tool registry, permission engine, execution engine, state store, memory layer, compactor, planner, workflow scheduler, skill registry, connector manager, approval manager, and trace/eval system. You do not need every component on day one, but seeing them named clearly makes gaps obvious.

The tools-and-permissions reference is another high-signal section. It argues against broad verbs like execute_anything(command) or call_api(url, method, body) and pushes narrow typed tools such as draft_customer_email or request_refund_approval. That is practical advice, not theoretical purity. Narrow tools make approval logic, telemetry, and retries sane.

The repo is also good on maturity levels. It distinguishes answer-only assistant, retrieval agent, drafting agent, approval-gated actor, policy-bounded autonomous actor, and long-running goal worker. Many teams jump directly to Level 4 language without surviving Level 2. This skill is a strong corrective.

Another useful feature is reference mapping. SKILL.md does not dump every document into context up front. It tells host agents which file to load for which problem: coding-agents.md for repository-facing work, provider-api-patterns.md for OpenAI or Anthropic integration choices, workflow-orchestration.md for decomposed long-running tasks, and prompt-caching-and-cost.md for cache-aware context layout. That progressive-disclosure pattern is itself a good design lesson. If you expose every workflow and every connector all at once, the agent spends context budget on possibilities instead of action.

The repository’s permission model is also concrete enough to reuse directly. It classifies tools across read-only, search-only, compute-only, draft-only, write-local, write-internal, write-external, financial, communication, identity-access, security-sensitive, destructive, and privileged-admin. That list does two things well. First, it forces product teams to admit that not all “tools” are equal. Second, it provides a stable vocabulary for approval and telemetry. When incident review happens, “tool call failed” is much less informative than “external-write tool was blocked by approval gate.”

There is practical writing value too. The repo’s default output template for new harnesses gives teams a checklist-shaped design document: objective, scope, autonomy, loop, instructions, tools, planning, context, connectors, safety, observability, and rollout. That structure is strong enough to use in architecture reviews even if no coding agent consumes the skill directly.

Tradeoff is clear too: this is reference layer, not framework. It will not scaffold your Worker, your Python runtime, or your database schemas. You still need implementation choices. But as planning and audit artifact, that is often better. It stays portable across OpenAI, Anthropic, MCP-based hosts, and custom harnesses.

If you are already using agent workflow repos like /blog/mercury-agent-skills-registry/ or quality-gate skill sets like /blog/guard-skills-coding-agent-quality-gates/, this repo fits one layer deeper. It helps answer whether your harness itself is disciplined enough to deserve more tools and more autonomy.

Practical Evaluation Checklist

  • [ ] Use the maturity model to classify your current agent honestly.
  • [ ] Audit whether risky actions are split into draft vs commit tools.
  • [ ] Check if approvals, plans, and active state live outside prompt context.
  • [ ] Review whether tool schemas are narrow and typed instead of generic.
  • [ ] Add at least one budget and one stop condition to long-running agent loops.

Where It Fits in Real Projects

This skill is most useful in three common moments.

First, before a build starts. Product teams often know they want “an agent,” but not whether they want retrieval assistant, drafting assistant, or approval-gated worker. The maturity model helps stop category mistakes early.

Second, during painful second iteration. Many agent projects feel good for two demos, then degrade once connectors, approvals, and real workloads appear. The references on compaction, permissions, and workflow boundaries are aimed exactly at that stage.

Third, during audit. If you inherit a working but opaque agent, this repo gives you language for asking better questions: where do approvals live, what are tool risk classes, what survives compaction, what are stop conditions, and what traces exist when something goes wrong?

That makes it useful beside implementation-heavy tooling rather than in competition with it. Frameworks help you build. This skill helps you avoid building sloppy autonomy around those frameworks.

Security Notes

The skill’s main security value is architectural: keep authority, credentials, approvals, and final commits outside model-controlled text. That is safest default whether your agent edits files, calls APIs, or drafts external messages.

The repo also treats retrieved content as untrusted instructions. That is important for MCP and connector-heavy systems because tickets, docs, webpages, and PDFs often contain language that looks like policy but is only data.

Finally, “allow all tools” is implicitly flagged as development-only posture. In production, narrow tools, explicit risk classes, and runtime permission decisions should dominate.

FAQ

Q: Is this repo only for coding agents?
A: No. The README explicitly expands it to research, support, finance, legal, healthcare, education, and workflow automation agents.

Q: What is most valuable single takeaway?
A: Treat harness as control plane. The model should propose, but runtime code should validate, authorize, execute, and record.

Q: Does it replace an SDK or framework?
A: No. It complements them by giving safer architecture patterns for whatever SDK or framework you choose.

Q: Who should read this first?
A: Teams moving from single-turn assistants to approval-gated or long-running agents.

Conclusion

agents-best-practices is one of better small repos to read before you add more power to an AI agent. It will not dazzle you with benchmarks or UI. It does something more useful: it names runtime boundaries that keep agent systems legible when novelty wears off and real operations begin.