Twill AI – Self-Improving Software Engineer
Twill keeps a persistent dev environment for each repo, forks it for every task, and lets coding agents write back memories, setup fixes, and skills while verifying PRs with tests and screenshots.
TL;DR
TL;DR: Twill is a cloud-based AI coding agent that maintains a persistent per-repo dev environment, learns project-specific memories and skills over time, and delivers PRs verified by tests and screenshots.
Source and Accuracy Notes
What Is Twill?
Twill is an AI software engineer that lives in the cloud and gets better at your codebase the more you use it. Unlike one-shot coding agents that start fresh every session, Twill maintains a persistent dev environment per repository — forked for every task so parallel work doesn’t conflict.
The core loop: you open a task, Twill forks your repo environment, makes changes, and writes back learnings (memories, setup fixes, skills) that future agents can reuse. PRs are verified with your actual test suite and screenshots before being marked done.
Setup Workflow
Step 1: Connect Your Repository
Visit twill.ai and sign in with GitHub. Authorize Twill to access your repos.
Step 2: Create a New Task
# Describe the task in plain English
# Twill forks the repo environment and starts working
Step 3: Review the PR
Twill opens a PR with your changes. The agent run log shows exactly what it did, what it learned, and what tests passed.
Deeper Analysis
How the Persistent Environment Works
Each repo gets its own sandboxed dev environment in Twill’s cloud. When you file a task, Twill forks that environment — so if a previous task left half-configured tooling, the new fork starts clean. But the knowledge layer persists across forks: memories, skill configs, and setup fixes written by prior agents are shared.
Memory and Skill System
Agents write three types of knowledge back:
- Memories — project-specific context (naming conventions, architecture patterns)
- Setup fixes — corrections to broken CI configs or missing deps
- Skills — reusable prompt snippets for recurring task types
Future agents read this knowledge before touching code, so the second (and third, and nth) PR is faster and more accurate than the first.
Verification: Tests + Screenshots
Twill runs your existing test suite against every PR. For UI projects, it also captures screenshots and diffs them against the base branch — catching visual regressions that unit tests miss.
Practical Evaluation Checklist
- [ ] Connects to GitHub and lists repos correctly
- [ ] Forks environment on new task without conflicts
- [ ] Writes memories that subsequent agents read
- [ ] Test suite runs and reports pass/fail per PR
- [ ] Screenshot diff captures visual changes
- [ ] Knowledge layer improves across multiple tasks
- [ ] No credential leakage — agents operate in sandboxed envs
Security Notes
Twill operates in isolated cloud environments per repository. Credentials are managed through OAuth, not stored as secrets. Agents run in sandboxed containers and cannot access other repos without explicit authorization. The memory layer is per-repo — a memory written in Repo A is not accessible to Repo B.
FAQ
Q: How does Twill differ from a one-shot coding agent like Cursor or Copilot? A: Cursor and Copilot work session-by-session with no persistent context. Twill maintains a knowledge layer across tasks — the more you use it, the more it knows about your codebase. It also pre-verifies PRs with your actual test suite before surfacing them.
Q: Does Twill work with private repos? A: Yes. You authorize Twill via GitHub OAuth and control access per-repo. It only sees repos you explicitly grant.
Q: What happens if an agent writes bad memories? A: Memories are stored but not automatically applied. You can review and edit them before they influence future agent runs. The system is designed to be additive, not destructive.
Conclusion
Twill solves the “fresh agent, cold context” problem by keeping a persistent, learning dev environment per repo. The memory and skill layers make each successive PR faster and more accurate. If you’re managing a codebase with recurring tasks — refactors, test coverage, documentation — Twill’s cloud agents can offload that work while learning your project’s conventions over time.
For small teams or solo devs tired of re-explaining context to every new agent session, Twill is worth evaluating. The YC S25 backing gives it runway to mature.
Related Posts
dev-tools
Automotive Skills Suite for AI Engineering
Evaluate Automotive Skills Suite for APQP, ASPICE, HARA, safety-plan, and DIA workflows with setup notes, governance risks, and SME review guidance.
5/28/2026
dev-tools
awesome-agentic-ai-zh Roadmap Guide
Explore awesome-agentic-ai-zh as a Chinese agentic AI learning roadmap, with setup notes, track selection, study workflow, and evaluation guidance.
5/28/2026
dev-tools
Baguette iOS Simulator Automation Guide
Set up Baguette for iOS Simulator automation, web dashboards, device farms, gesture input, streaming, and camera testing with Xcode caveats.
5/28/2026