ai-setup 10 min read

OpenNutrition - A 300K Food Database Built by an AI Stack

OpenNutrition is a free, ODbL-licensed nutrition database of 300K+ foods, built with a multi-model AI pipeline that ingests USDA, AUSNUT, and CNF and fills gaps with reasoning models.

#ai-data#open-data#nutrition #dev-tools #consumer
By
Share: X in
OpenNutrition AI-built food database hero thumbnail

TL;DR

TL;DR: OpenNutrition is a free, ODbL-licensed nutrition dataset of 300,000+ everyday, branded, and restaurant foods, plus a free iOS macro tracker, built by a multi-model AI pipeline that blends USDA, AUSNUT, FRIDA, and CNF data with reasoning models that explain every value they fill in.

Source and Accuracy Notes

What Is OpenNutrition?

Most nutrition apps depend on one of three databases, and each has a real problem.

USDA FoodData Central     # Public, but smaller scope after the
                          # "Foundational Foods" cut (under 300 items)
                          # vs. the legacy SR dataset.

Open Food Facts            # Crowdsourced, but deliberately does NOT
                          # estimate micronutrient gaps.

MyFitnessPal              # Massive coverage, but proprietary, no
                          # public API, no transparency, no re-use.

Commercial APIs (NUTRINO,  # Accurate and rich, but priced per-seat
Nutritionix, ESHA)         # with licensing that blocks re-distribution
                          # in new products.

OpenNutrition is a fourth option: a free, openly licensed dataset built by an AI pipeline that starts with the public sources, then uses large reasoning models to fill in the gaps with auditable explanations.

The current release covers 300,000+ food items, split across four categories that the founder documents on the /about page:

Generic everyday foods   # 5,287 entries: fruits, vegetables, meats,
                         # grains, legumes. Best covered by USDA SR
                         # and CNF.

Prepared foods           # 3,836 entries: home-cooked and generic
                         # restaurant dishes. Underserved by public
                         # data due to ingredient variability.

Branded grocery products # 313,442 entries: packaged goods identified
                         # by UPC. Grounded in USDA Branded Foods,
                         # with naming and serving-size data from
                         # Open Food Facts.

Restaurant menu items    # 4,182 entries from ~50 US chains.
                         # Extracted from official PDFs and sites
                         # with DeepResearch + o1-pro.

Each row carries a per-100g nutritional profile, a standardised serving size, a description, and (where possible) a citation back to USDA, AUSNUT, FRIDA, or CNF. Items generated by the AI pipeline carry a structured explanation of the reasoning instead of a citation.

How the AI Pipeline Actually Builds the Data

The /about page walks through the build in unusual detail. The short version: every row in the dataset has a paper trail, and the founder used four model families to keep the paper trail honest.

# 1. Itemization & naming
# Claude Sonnet 3.5 handled initial itemization, naming
# standardisation, and serving-size alignment. The founder
# preferred it at low temperature for rule-following.

# 2. Coverage expansion
# OpenAI o1-pro with persona-based prompts surfaced the foods
# real users would actually search for (especially for
# non-Western cuisines). Embedding-based dedupe via
# text-embedding-3-large kept the additions non-redundant.

# 3. Restaurant data extraction
# OpenAI DeepResearch pulled official PDFs from ~50 chains.
# o1-pro turned the unstructured text into standardised JSON.
# Timeouts and refusals limited how much of this made the
# v1 release.

# 4. Final values
# Reasoning models (o3-mini-high, o1-preview) generated
# per-100g values, with field-by-field text explanations
# followed by a summarised JSON. Two-step process so the
# model reasoning was auditable line by line.

# 5. Auditing
# Random audits with o1-pro consistently flagged minor
# micronutrient drift; nothing surfaced that fell outside
# reasonable ranges for typical diets.

The full pipeline is also documented in code on the iOS app side. When you scan a barcode that the database has not seen before, the in-app DeepSearch feature runs the same playbook in real time: read the package, run a small model call, save the new row, and push it back into the public dataset on the next release.

Using the Dataset

Three access patterns ship today, all free:

# 1. TSV download (offline, the most useful for engineers)
wget https://www.opennutrition.app/opennutrition_foods.zip
unzip opennutrition_foods.zip
# opennutrition_foods.tsv   — main dataset
# LICENSE-ODbL.txt          — required attribution
# LICENSE-DbCL.txt          — database contents license
# README.md                 — schema and field definitions
# 2. Web search (no signup)
# Browse https://www.opennutrition.app/search
# The web UI is a thin client over the same TSV.
# The founder's Go-based prefix search engine returns
# ranked matches across the full library.
# 3. MCP server (for AI agents)
# 187-star TypeScript MCP server that exposes the dataset
# to Claude, Cline, and any MCP-compatible agent.
docker run --rm -p 9113:3000 deadletterq/mcp-opennutrition
# Then add to your MCP client config:
#   "mcp-opennutrition": {
#       "type": "streamable-http",
#       "url": "http://localhost:9113"
#   }
# Tools: search_foods, browse_foods, get_food, lookup_barcode

The TSV is the most useful surface for builders. If you are building any kind of meal planner, fitness coach, recipe app, or research dataset, this gives you a starting point that is more complete than USDA alone and free of MyFitnessPal’s redistribution restrictions.

Deeper Analysis

A few things stand out about the build that are worth understanding before you depend on the data.

The license is the real point. The dataset is ODbL, the same license that OpenStreetMap uses. Any product that uses a non-trivial slice of the data has to publish its derivative under the same license. That is a constraint, but it is also the constraint that makes the data safe to depend on. There is no per-seat fee, no API rate limit, no redaction risk, and no ambiguity about whether a public-facing feature is allowed.

The data quality story is honest, not magical. The founder does not claim perfect micronutrient coverage on branded and restaurant items. He says the gaps are small, and the user feedback loop in the iOS app feeds corrections back into the next release. The /about page is explicit that micronutrient values for branded items are estimates, and that the pipeline is not a substitute for professional medical advice. That kind of caveat in a launch post is rare and worth weighting positively.

The “first, best customer” pattern is the secret to its sustainability. The same dataset that powers a free iOS app powers a paid tier ($49/year) that unlocks more agentic searches, data backup, and prioritised micronutrient coverage. Revenue from the iOS app funds the next round of dataset expansion. If the iOS app gets traction, the public dataset gets bigger. If the public dataset gets bigger, the iOS app becomes more useful. The dataset and the product are not separate things; the dataset is a side-effect of building a nutrition app that the founder wanted to use himself.

The MCP surface is the most interesting near-term hook. With 187 stars in under a year and a maintained TypeScript codebase, the MCP server is the easiest way to plug OpenNutrition into a Claude or Cline workflow. A coding agent that can ask “what is the protein per 100g of Trader Joe’s soyrizo” without a custom integration is meaningfully more useful than one that cannot.

Practical Evaluation Checklist

Before you wire OpenNutrition into a product, run through these checks:

# 1. Schema fit
head -1 opennutrition_foods.tsv
# Verify the columns you need (calories, protein, fat, carbs,
# fibre, sodium, micronutrients) are present for your use case.

# 2. Attribution path
# ODbL requires attribution. Plan where the dataset credit
# will live in your product (settings page, about page,
# API docs, README).

# 3. Coverage test
# Sample 50 random UPCs from your target market. Check
# coverage in the TSV. The /search web UI lets you do
# this without downloading the full file.

# 4. Micronutrient caveat
# If you depend on vitamins or minerals (not just macros),
# audit at least 20 branded products. The estimates are
# good but not perfect; the founder says so himself.

# 5. Update cadence
# Watch the GitHub releases or the founder's HN replies for
# the next dataset drop. New items from the iOS app flow
# in through DeepSearch on a rolling basis.

Security Notes

The dataset is TSV with public food data. There is no PII, no auth, no API key to leak, and no executable content. The TSV does not need to be quarantined.

The MCP server, by contrast, runs locally and is fine to expose inside a dev container. It does not phone home; all queries resolve against the bundled snapshot of the dataset. If you build on top of it, treat the ODbL attribution requirement as a hard constraint on the public-facing surfaces of your product, not a footnote in the README.

FAQ

Q: Is the data really free to use in a commercial product?

A: Yes, under the Open Database License (ODbL). You can build a paid product on top of it, redistribute the data, or fork it. The only hard requirement is that any “substantial” derivative database is published under the same license, and you must attribute OpenNutrition. This is the same trade-off you make with OpenStreetMap.

Q: How does it compare to the official USDA FoodData Central?

A: USDA is the source of truth for many entries, especially the new “Foundational Foods” set, but it covers under 300 generic foods in that set and adds no restaurant data. OpenNutrition includes the USDA entries plus 5,287 generic foods, 3,836 prepared foods, 4,182 restaurant items, and 313,442 branded products, with the gap-filling values documented in the same row.

Q: Can I use the dataset to train a model?

A: The ODbL allows it, with the same attribution and share-alike rules. The dataset is small (TSV, not a corpus) and is better suited for ground-truth evaluation than pre-training.

Q: Is the iOS app required to use the dataset?

A: No. The TSV download and the web search interface are both fully usable without the iOS app. The iOS app is a separate product that uses the dataset; the dataset is independent.

Q: How does the MCP server get new data?

A: It bundles a snapshot of the public dataset at build time. New versions of the MCP server ship when a new public dataset release lands. The server does not call back to OpenNutrition servers at runtime.

Q: What about the “AI-generated values are estimates” caveat?

A: It is real. The /about page is unusually explicit about the limitations of micronutrient estimates on branded and restaurant items. For macros (calories, protein, carbs, fat), the values are reliable. For micronutrients, treat them as informed estimates, not ground truth, and verify against the source PDF or USDA entry when accuracy matters.

Conclusion

OpenNutrition is the most ambitious open-data nutrition project in a long time. The build is unusually transparent: the founder documents the model pipeline, the licensing choice, the known limitations, and the feedback loop that funds future expansion. The dataset is already large enough to be useful for real products, the license is permissive enough to build on, and the MCP surface is the easiest on-ramp for AI agents.

If you are building any kind of consumer health tool, recipe app, or research dataset that touches nutrition, this is the free baseline you should start from.