Open Forge

The Problem

AI coding tools give every engineer the same productivity boost — write more code, faster. This produces diminishing returns because the bottleneck was never typing speed.

The real bottleneck is everything around the code: translating requirements into tasks, enforcing quality across iterations, detecting regressions, deciding when to ship. Today, humans do all of that manually. AI writes a function, a human reviews it. AI fixes a bug, a human verifies it. The agent is fast, but the human is still the loop.

The shift that matters is moving from "AI writes code, humans verify" to "humans specify intent and constraints, the system handles the rest." The engineer who can define better constraints and build better observability gets exponentially more from AI. Everyone else is left competing on typing speed.

What Open Forge Is

Open Forge is infrastructure for that shift. An agent-first harness engineering system for fully autonomous software development — not a developer tool with AI features, but an autonomous system where agents are the primary actors.

You describe what you want built. The system plans, implements, tests, reviews, and ships working code. It self-corrects through quality gates, enforces cost and safety constraints via telemetry, and closes the feedback loop with statistical evaluation of agent outputs — enabling autonomous decisions about model changes, prompt tuning, and regression detection.

The differentiator is not the agents. It's the harness: constraints that prevent waste, observability that makes agent behavior measurable, and evaluation that makes output quality a data-driven decision instead of a human judgment call.

Current Status

Early development. The architecture is defined and core packages are scaffolded, but most implementations are in progress. Contributions and feedback are welcome — expect breaking changes.

Package	Status
`@open-forge/pipeline`	Core types, roadmap generation, handoff management, and helper utilities implemented. Orchestrator in progress.
`@open-forge/mcp`	Functional — MCP server with spec tools, prompts, and change tracking.
`@open-forge/telemetry`	Type definitions exported. Implementations (event emitting, constraint evaluation) not yet started.
`@open-forge/evaluations`	Type definitions exported. Implementations planned — blocked on telemetry for §3.6/§3.7 integration.

Packages

Package	Description
`@open-forge/pipeline`	An OpenCode plugin that turns natural language requirements into a fully built application — autonomously, with minimal human intervention
`@open-forge/mcp`	MCP server implementation for OpenSpec archives
`@open-forge/telemetry`	Telemetry and constraint evaluation for open-forge pipeline stages
`@open-forge/evaluations`	Evaluation framework for measuring agent output quality, regression detection, and statistical analysis

Architecture Overview

Open Forge is an autonomous software development pipeline. You give it a description of what you want built, and it runs a multi-phase agent workflow to produce working code with quality enforcement at every step.

Requirements (text)
       │
       ▼
┌─────────────┐
│ Intent Router│ ── Classifies: new-project / feature / bug-fix / refactor / migration
└──────┬──────┘
       │
       ▼
┌──────────────┐
│ ROADMAP Gen  │ ── Produces phased, dependency-ordered task plan
└──────┬───────┘
       │
       ▼
┌──────────────────────────────────────────────────┐
│ Pipeline Orchestrator (per phase)                 │
│                                                   │
│  1. Read HANDOFF.md (restore context)             │
│  2. Generate & select plan (critic agent)         │
│  3. Write tests (test author — firewalled)        │
│  4. Implement (implementer — firewalled)          │
│  5. Quality gates (verify → QA → security →       │
│     architect → code review → integration)        │
│  6. Cleanup agent (entropy management)            │
│  7. Update HANDOFF.md (persist context)           │
│                                                   │
│  On failure: retry up to 3x → drift sentinel     │
│  On success: advance to next phase                │
├──────────────────────────────────────────────────┤
│ ⚡ Telemetry (@open-forge/telemetry)              │
│                                                   │
│  • Captures structured events from every stage    │
│  • Evaluates constraints (token budget, retries,  │
│    duration limits) after each stage              │
│  • Halts pipeline on constraint violations        │
│  • Generates audit trails for human review        │
└──────────────────────────────────────────────────┘
       │
       ▼
  ROADMAP_COMPLETE (or PIPELINE-ISSUES.md for blockers)
       │
       ▼
┌──────────────────────────────────────────────────┐
│ 📊 Evaluations (@open-forge/evaluations)         │
│                                                   │
│  • Runs datasets against agent outputs            │
│  • Deterministic, LLM-judge, and custom evals    │
│  • Statistical analysis (confidence intervals,    │
│    significance testing, regression detection)    │
│  • Baseline vs candidate comparison reports       │
│  • Consumes telemetry events + constraint results │
│  • CLI and programmatic API                       │
└──────────────────────────────────────────────────┘

Key Design Principles

Context firewalls — The test-writing agent and the implementing agent are deliberately isolated from each other to prevent hallucination reinforcement
Self-correction loops — When quality gates fail, the system retries up to 3x before escalating to a human via PIPELINE-ISSUES.md
Drift detection — If an agent gets stuck in a loop, a sentinel file is written and execution halts rather than burning tokens
Constraint enforcement — Telemetry evaluates token budgets, retry limits, and duration caps after every stage, providing a hard stop before costs spiral
Lessons feedback — Recurring mistakes from gate failures are captured in LESSONS.md and fed to future agents
Evaluation-driven quality — Post-pipeline evaluation measures agent output quality with statistical rigor, enabling regression detection and data-driven decisions on model/prompt changes

For the full design — including context firewall rules, gate sequences, and guardrails — see the pipeline REQUIREMENTS.md. For telemetry constraint definitions, see the telemetry requirements. For evaluation framework details, see the evaluations requirements.

Quick Start

# Install dependencies
npm install

# Build all packages
npx nx run-many -t build

# Run tests
npx nx run-many -t test

# Lint all projects
npx nx run-many -t lint

# Visualize the project graph
npx nx graph

Useful Commands

npx nx graph                          # Interactive dependency graph
npx nx show project <name> --web      # View project details
npx nx run-many -t build              # Build all projects
npx nx run-many -t test --parallel=3  # Test in parallel
npx nx affected -t build              # Build only affected projects
npx nx release --dry-run              # Preview release changes
npx nx release                        # Create a new release

Nx Cloud

This workspace is connected to Nx Cloud for remote caching, distributed task execution, and self-healing CI.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.agents		.agents
.claude		.claude
.codex		.codex
.cursor		.cursor
.gemini		.gemini
.github		.github
.husky		.husky
.nx/self-healing		.nx/self-healing
.opencode		.opencode
.verdaccio		.verdaccio
.vscode		.vscode
docs		docs
openspec		openspec
packages		packages
scripts		scripts
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
HANDOFF.md		HANDOFF.md
README.md		README.md
ROADMAP.md		ROADMAP.md
commitlint.config.mjs		commitlint.config.mjs
eslint.config.mjs		eslint.config.mjs
nx.json		nx.json
opencode.json		opencode.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json
vitest.workspace.ts		vitest.workspace.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Forge

The Problem

What Open Forge Is

Current Status

Packages

Architecture Overview

Key Design Principles

Quick Start

Useful Commands

Nx Cloud

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Open Forge

The Problem

What Open Forge Is

Current Status

Packages

Architecture Overview

Key Design Principles

Quick Start

Useful Commands

Nx Cloud

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages