Bumblebee Go Package Inventory Guide

bumblebee GitHub tool guide thumbnail

TL;DR

TL;DR: bumblebee is a read-only Go inventory scanner for developer endpoints; it turns local package metadata, lockfiles, extension manifests, browser extension data, and selected MCP configs into NDJSON so responders can check known exposure quickly without executing package managers.

Source and Accuracy Notes

This guide uses the official repository perplexityai/bumblebee and its linked inventory-source documentation as the source for scope, install commands, profiles, output format, exposure catalog behavior, and safety model. The project is explicit about being read-only and about avoiding package-manager execution such as npm ls, pip show, or go list.

Commands below are preserved from the official documentation. I do not add invented flags, scheduler examples, or deployment wrappers. Bumblebee is useful for supply-chain response, but it is not an SBOM generator for shipped software and not an EDR replacement. It answers a narrower question: which developer machines currently show matching on-disk metadata for packages, extensions, versions, or MCP server entries?

What Is bumblebee?

bumblebee is a single static binary written in Go 1.25+ with zero non-standard-library dependencies. Its job is inventory collection on macOS and Linux developer endpoints. Instead of asking package managers what is installed, it reads known metadata files directly: npm lockfiles, pnpm and Yarn data, Bun locks, Python dist-info and egg-info metadata, Go module files, Gemfile.lock and gemspecs, Composer files, MCP JSON host configs, editor extension manifests, and browser extension data.

The output is newline-delimited JSON. Each component record can include ecosystem, name, version, source file, source type, root kind, profile, and related metadata. Diagnostics go to stderr as NDJSON, which makes the tool friendlier for fleet collection pipelines that separate inventory from scanner warnings.

The project’s threat model is practical. During an advisory, responders often know a bad package name, version range, browser extension, or MCP server indicator. They need to know which developer machines have evidence of it right now. SBOMs describe built artifacts. EDR describes execution and network behavior. bumblebee inspects messy local developer state.

Repo-Specific Setup Workflow

Step 1: Install or build the binary

bumblebee requires Go 1.25+. Install latest tagged release or pin a known version:

# Install the latest tagged release into $GOBIN.
go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest

# Or pin a specific tag.
go install github.com/perplexityai/bumblebee/cmd/[email protected]

To build from checkout, use the documented build and test commands:

go build -o bumblebee ./cmd/bumblebee
go test ./...

For traceable production builds, stamp an explicit version:

go build -ldflags "-X main.Version=v0.1.1" -o bumblebee ./cmd/bumblebee

Step 2: Run the embedded self-test

The self-test uses embedded fixtures with fake package names and makes no network calls. Use it before deploying to laptops or CI runners:

bumblebee selftest
# selftest OK (2 findings in 1ms)

Step 3: Pick the right scan profile

baseline scans common global and user package roots, toolchains, editor extensions, browser extensions, and MCP configs. project scans configured development directories. deep scans explicit roots and can walk broad paths such as $HOME; baseline and project refuse bare-home roots.

Step 4: Generate inventory or targeted findings

The quick-start commands show three common patterns:

# Baseline global inventory.
bumblebee scan --profile baseline > inventory.ndjson

# Daily project sweep with explicit roots.
bumblebee scan --profile project --root ~/code --root ~/work > projects.ndjson

# Limit a run to selected emitted ecosystems.
bumblebee scan --profile deep --root ~/code \
  --ecosystem npm,pypi \
  > npm-pypi.ndjson

# On-demand exposure scan against a published advisory.
bumblebee scan --profile deep --root ~/code \
  --exposure-catalog exposures.json \
  --findings-only \
  > findings.ndjson

Step 5: Inspect planned roots before scanning

The project documents a roots command that prints root kind and path pairs. Use it to confirm what population a run will cover:

bumblebee roots --profile baseline
# prints "<root_kind>\t<path>" lines

Step 6: Design exposure catalogs carefully

Exposure catalogs are JSON inputs that define known indicators. Use them for exact response questions, not vague risk scoring. Keep advisory source, package ecosystem, name, version constraints, and date in separate incident notes so findings remain auditable.

Deeper Analysis

bumblebee’s most useful engineering decision is refusing to execute package managers. During incident response, running npm, pip, go, or another tool across many endpoints can be slow, inconsistent, and sometimes unsafe. Local package-manager commands may trigger scripts, hit networks, use project-specific environment, or produce results that depend on caches. A read-only scanner that parses files creates a cleaner collection story.

The second strong decision is profile separation. Baseline, project, and deep scans answer different operational questions. Baseline is good for recurring endpoint hygiene. Project scans target developer workspaces where lockfiles and vendored metadata live. Deep scans are heavier and should be reserved for campaigns or incident response, especially with ecosystem filters and findings-only output.

The MCP coverage is timely. Developer endpoints increasingly contain AI tool configuration files. Some MCP host configs can include environment blocks with credentials. bumblebee parses server inventory but does not emit those environment values. That behavior is important: responders can detect relevant MCP server presence without turning inventory output into a secret dump.

NDJSON is also a good fit. Fleet systems, SIEM pipelines, object storage, and command-line tools can process line-oriented JSON without loading a whole report into memory. It also lets diagnostics stay separate on stderr while component records stream to stdout.

Limitations follow from the read-only model. If metadata is stale, missing, custom, or outside supported paths, bumblebee may not see it. It does not prove execution. It does not prove exploitability. It does not replace a full dependency graph built during CI. It gives responders fast evidence of on-disk exposure indicators across developer machines.

Practical Evaluation Checklist

Confirm Go 1.25+ and run bumblebee selftest on each target platform.
Compare baseline, project, and deep results on a test machine before fleet rollout.
Validate roots with bumblebee roots --profile baseline before collecting data.
Keep stdout inventory and stderr diagnostics separate in runners.
Use pinned versions for production collection so records map to a known scanner build.
Test exposure catalogs against fixture repos before incident use.
Document false negatives caused by unsupported ecosystems, custom paths, or missing metadata.

Security Notes

bumblebee is intentionally read-only, but inventory data can still be sensitive. Package names, internal project paths, browser extensions, editor extensions, and MCP server names can reveal company technology choices and incident scope. Store NDJSON outputs with access controls and retention limits.

Pay special attention to MCP configs. The tool avoids emitting env values from supported JSON configs, but raw source files may still contain secrets. Do not attach full configs to tickets unless necessary. If using bumblebee during a live advisory, avoid broad deep scans until filters and output handling are tested.

Because the binary is likely to run across developer endpoints, pin scanner versions and verify build provenance. The bumblebee version output includes build details that help tie collected records back to a specific build.

FAQ

Q: Does bumblebee execute package managers? A: No. It reads supported metadata files and avoids package-manager execution, which keeps scans predictable and read-only.

Q: Is bumblebee an SBOM tool? A: Not in the traditional shipped-artifact sense. It inventories developer endpoint state for response questions.

Q: Which scan profile should I start with? A: Start with baseline for global endpoint inventory, then use project for known workspace roots. Reserve deep for targeted response.

Q: Can it scan MCP configs? A: Yes, selected JSON MCP host configs are parsed for server inventory. Non-JSON configs are outside the documented v0.1 parsing scope.

Q: What format does it output? A: Component records are NDJSON on stdout, while diagnostics are NDJSON on stderr.

Conclusion

bumblebee is a focused supply-chain response tool for developer machines. Its value comes from narrow scope: static Go binary, read-only metadata parsing, explicit profiles, NDJSON output, and exposure-catalog matching. Use it when you already know what indicator you are looking for and need fast endpoint evidence. Pair it with SBOMs, EDR, and CI dependency controls for a full supply-chain picture.