Crustdata – Token-Efficient Web Search API for AI Agents
Crustdata maps web search results to real-world entities (people, companies, events) so AI agents spend tokens on reasoning, not on cleaning up messy HTML and.
TL;DR
TL;DR: Crustdata is a B2B data API that routes web search through an entity graph (people, companies, events) so AI agents get clean, structured data instead of raw HTML that needs expensive token clean-up.
Source and Accuracy Notes
- Product: https://crustdata.com
- HN Launch: https://news.ycombinator.com/item?id=47146819 (10 points, YC F24)
- Docs: https://docs.crustdata.com
What Is Crustdata?
Running AI agents at scale makes tokens a visible line item. Web data is the worst input: long pages, repeated boilerplate, mixed entities, stale claims. The conventional pipeline—web search, scrape, summarize, structure—forces every agent to do janitorial work before it can act.
Crustdata inverts this. It maintains a canonical entity graph of people and companies with stable internal IDs, aliases, and relationships. It continuously indexes the web and attaches each document to the right entity. The result is that when your agent queries “recent funding rounds by AI infrastructure startups,” it gets back structured, deduplicated records tied to verified company identities—not a pile of HTML that it has to parse itself.
The product is aimed at teams running AI agents in sales, recruitment, and investment workflows. YC F24 backing gives it pedigree.
Setup Workflow
Step 1: Get an API Key
Sign up at crustdata.com and grab your API key from the dashboard. The free tier gives a reasonable allocation for trying things out.
Step 2: Install the Client
pip install crustdata
Or use the REST API directly:
curl -H "Authorization: Token YOUR_API_KEY" \
"https://api.crustdata.com/v1/search?query=AI+infrastructure+startups"
Step 3: Make Your First Search
from crustdata import Crustdata
client = Crustdata(api_key="YOUR_API_KEY")
# Search with entity resolution
results = client.search(
query="recent AI infrastructure funding rounds",
entity_types=["company", "person"],
include_raw_html=False
)
for result in results:
print(result["canonical_name"], result["funding_amount"])
The entity_types filter lets you scope results to companies, people, or events. By default, the API returns structured records—no HTML blobs, no duplicated nav content, no scraping required.
Step 4: Connect to an AI Agent
Crustdata ships an MCP (Model Context Protocol) server, so you can wire it directly into Claude:
# crustdata_mcp_config.json
{
"mcpServers": {
"crustdata": {
"command": "npx",
"args": ["@crustdata/mcp-server", "--api-key", "YOUR_API_KEY"]
}
}
}
Drop that into your Claude config and the agent can query live B2B data mid-conversation without you having to pre-fetch and stuff it into context.
Deeper Analysis
Token Economics
The core pitch is efficiency. A naive web search pipeline pays twice: once to fetch and parse the raw HTML, once to summarize it into something usable. Crustdata says its entity graph eliminates the first parse step because the data arrives pre-structured. For high-volume agentic workflows, that difference compounds quickly into real cost savings.
Entity Graph Depth
The graph covers people and companies with aliases, historical name changes, and relationship links. A query for “Steven Baker CTO at Perplexity” and “Steve Baker, CTO Perplexity” resolve to the same canonical ID. This matters for any agent doing deduplication or cross-referencing across multiple data sources.
Competitive Landscape
Companies like Clearbit, Apollo, and Crunchbase occupy adjacent space but are more oriented toward human lookup workflows. Crustdata is explicitly agent-first: the API shape, the MCP server, the token-efficient response format all point toward programmatic consumption rather than manual research.
Practical Evaluation Checklist
- Does the entity resolution correctly deduplicate variant names?
- Is the data fresh enough for your use case (sales signals, recruiting, investment research)?
- Does the free tier give enough to validate the API shape before committing?
- Does the MCP server integration work cleanly with your agent setup?
- How does pricing scale with query volume at the scale you need?
Security Notes
- API keys are per-user; rotate them from the dashboard if compromised.
- Data is delivered over HTTPS only.
- The entity graph contains publicly sourced web data—verify accuracy for any compliance-sensitive use cases.
FAQ
Q: What data does Crustdata cover?
A: People and companies, with coverage across B2B sales, recruitment, and investment signals. The entity graph includes aliases and relationship links, not just the canonical name.
Q: How is this different from just scraping with an agent?
A: Scraping forces your agent to consume raw HTML and pay token costs for the parsing work. Crustdata pre-structures the data through its entity graph before it reaches your agent, shifting the token cost from your API budget to theirs.
Q: Does it work with non-Claude AI agents?
A: Yes. The REST API is agent-agnostic. The MCP server is Claude-specific, but the underlying API works with any agent that can make HTTP calls.
Q: What is the pricing model?
A: Credit-based. The free tier provides enough allocation to evaluate the API. Volume pricing kicks in as agent usage scales. Check the pricing page for current rates.
Q: Does Crustdata replace a CRM enrichment tool like Clearbit?
A: Not exactly. Crustdata is more geared toward real-time web data and entity resolution. CRM enrichment tools tend to focus on static firmographic and contact data. The two can complement each other.
Conclusion
Crustdata targets a specific pain point: AI agents that burn tokens parsing messy web data instead of doing useful reasoning. By routing search through an entity graph before results reach your agent, it shifts the cleanup cost off your API budget. If you are running agents in sales, recruiting, or investment contexts and watching token costs, this is worth evaluating against a raw scraper-plus-summarize pipeline.
The YC F24 backing and MCP server integration suggest the team is serious about the agentic use case. Try the free tier, point an agent at it, and see if the token savings justify the subscription.