Mosaic – Agentic Video Editing With a Node-Based Canvas
Mosaic lets you build and run multimodal AI video editing agents on a node-based canvas. Upload raw footage, design a workflow, and let AI handle the heavy.
TL;DR
TL;DR: Mosaic is a node-based AI video editing platform where you compose multimodal editing agents from visual building blocks, then let them autonomously cut, enhance, and reframe raw footage — all within a canvas you control.
Source and Accuracy Notes
- Product: https://edit.mosaic.so (free tier available)
- Docs: https://docs.mosaic.so
- HN Discussion: Launch HN: Mosaic (YC W25) – Agentic Video Editing
What Is Mosaic?
Mosaic is a multimodal AI video editing platform built by two former Tesla engineers. The core idea is simple: instead of wrestling with a timeline and manually applying edits, you drop nodes onto a canvas — each node is a configurable video operation — wire them together, and let a multimodal AI agent execute the entire workflow end-to-end.
The node-based approach comes from visual programming tools like Blender’s compositor or Unreal’s Blueprints. Each node represents a discrete step: trim clips, add text overlays, detect shot types, apply reframes, generate captions. You can branch and parallelize — run the same raw footage through two different prompt variants simultaneously to A/B test the output.
Under the hood, the platform uses visual intelligence to understand what’s actually in your video: saliency maps, audio transcriptions, emotion detection, spoken-word timestamps, and shot-type classification all feed into the editing agent’s decisions.
Setting Up Your First Editing Agent
Step 1: Create a Free Account
Sign up at edit.mosaic.so. The free tier lets you upload videos, build workflows on the canvas, and use the inline timeline editor. Paid plans unlock node execution runs to cover model inference costs.
Step 2: Upload Raw Footage
Upload one or more video files. Mosaic processes them through its analysis pipeline, extracting:
- Saliency maps (where the viewer’s eye naturally goes)
- Audio transcripts and word-level timestamps
- Shot type classification (wide, medium, close-up)
- Object and action detection
- Light level analysis
This analysis is what makes the agent “smart” — it knows not just what frames contain, but which moments are worth keeping.
Step 3: Design Your Workflow on the Canvas
Drag nodes from the palette onto the canvas and wire them together. Key node types:
- Clip Trimmer — define in/out points on a source clip
- Text Overlay — add dynamic captions, titles, or subtitles
- Reframe — auto-crop to different aspect ratios (16:9 → 9:16 for Reels)
- AI Enhance — apply upscaling, color grading, or stabilization
- Audio Sync — align voiceover or music tracks
- Clip Generator — create B-roll from text prompts
You can branch a single workflow to produce multiple outputs in parallel.
Step 4: Run and Review
Hit “Run” and watch the canvas animate through each node step. The agent processes the video according to your node configuration. Once complete, open the timeline editor to review and make fine adjustments before exporting.
Step 5: Export
Mosaic can export timeline state to DaVinci Resolve, Adobe Premiere Pro, and Final Cut Pro — so you’re not locked in if you want to do manual finishing work in a traditional NLE.
Deeper Analysis
Why Node-Based Rather Than Chat?
The founders tried a pure chat interface first. For video, it fell apart in two ways: long videos generate huge context windows that slow down model responses, and professional editors have repeatable workflows they want to save and reuse. A node graph solves both — you can save a “podcast-to-Reels” workflow as a template, branch it for variants, and re-run it on new footage without conversation overhead.
The Multimodal Intelligence Layer
What sets Mosaic apart from traditional automation is the depth of visual understanding built into the agent. It’s not just running ASR on the audio track — it’s doing saliency analysis to know what draws attention, mean movement analysis to identify camera motion, and shot-type classification to maintain visual consistency. The agent makes editing decisions based on what a human editor would internalize unconsciously.
Use Cases
- Script-based cuts — feed the agent a transcript and have it create a talking-heads cut automatically
- Clip repurposing — break long-form podcasts or webinars into short social clips
- Dynamic captions — auto-generate styled captions synchronized to speech
- A/B content testing — branch a workflow to compare two different editing styles from the same source
- Content localization — dub with voice cloning and lip sync (on roadmap)
Practical Evaluation Checklist
- [ ] Sign up and upload a test video (2–5 minutes works well)
- [ ] Run the default “Auto-Edit” agent to see what the platform produces baseline
- [ ] Build a custom 3-node workflow (trim + caption + reframe) and run it
- [ ] Try branching the canvas to produce both 16:9 and 9:16 outputs in parallel
- [ ] Export to DaVinci Resolve and inspect the timeline
- [ ] Check the analysis panel to see what the model detected (saliency, shot types, transcript)
Security Notes
Mosaic processes user-uploaded video through cloud infrastructure. Video content is stored and analyzed on their servers. Review their privacy policy before uploading proprietary or sensitive footage. The platform does not claim end-to-end encryption for stored media.
API access via docs.mosaic.so allows programmatic workflow triggering — ensure any API keys are stored securely and scoped to minimal permissions.
FAQ
Q: Does Mosaic replace a traditional NLE like Premiere Pro or DaVinci Resolve?
A: Not entirely. Mosaic is best thought of as an autopilot that gets you 80–90% of the way to a finished edit. The canvas handles the creative decision-making, but you still export to a traditional NLE for manual fine-tuning, color grading, and final delivery.
Q: What video formats does Mosaic support?
A: The platform accepts common formats including MP4, MOV, and WebM. Output is typically H.264 MP4. Specific codec support beyond this should be confirmed via their docs as the platform evolves.
Q: How does pricing work?
A: The free tier covers account creation, video upload, canvas access, and the timeline editor. Node execution runs (when the AI actually processes your video) consume credits on paid plans. The founders mention this is to cover multimodal model inference costs.
Q: Can I use my own AI models with Mosaic?
A: Currently the platform uses Mosaic’s own multimodal models for analysis and editing decisions. Custom model integration is not publicly documented — check their API docs for any upcoming capability here.
Q: Does the platform work for screen recordings and tutorials?
A: Yes. The multimodal analysis handles screen-recorded content well, particularly for detecting text overlays, speaker changes, and action sequences. It’s a strong fit for teams that produce recurring video content like walkthroughs or demo libraries.
Conclusion
Mosaic is a genuinely new take on video editing — not a AI chatbot layered on top of a traditional timeline, but a canvas where the editing agent is built from composable, visual building blocks. The node-based approach means workflows are reusable, inspectable, and parallelizable in ways that chat-based editing simply can’t match.
The free tier is enough to get a real feel for the platform. If you produce video content regularly — podcasts, tutorials, social clips — it’s worth 30 minutes of experimentation. The gap between “raw footage” and “publishable clip” shrinks considerably when the editing agent understands what’s actually in your video.