Workflow Engine
Controls task loading, bounded routing, scoped context building, provider execution, validation, retry logic, logging, and final status.
Operational MVP
A local Python orchestrator for safe, bounded, cost-aware AI coding workflows. It uses a role_provider backend, supports an opt-in OpenAI provider through configuration, and keeps model output constrained to patch proposals or diagnosis output only.
This project is a local Python workflow engine for bounded AI-assisted coding tasks. The orchestrator keeps policy, command execution, validation, budgeting, and rollback under deterministic Python control while model providers are used only to produce bounded patch proposals or diagnosis output.
Current Status
The runtime backend is role_provider. OpenAI is opt-in through
configuration, and the fake provider supports deterministic tests and no-network dry
runs. Phase 1, Phase 2, and Phase 2.5 are complete; Phase 3 is focused on OpenAI
multimodel cleanup, OpenCode runtime removal from the active path, backend
normalization, naming cleanup, docs consistency, and final validation.
role_provider backend with opt-in OpenAI and fake-provider test pathsArchitecture
role_provider.git apply --check before any patch lands.ruff, mypy, and pytest run after patch application.Implemented Components
Controls task loading, bounded routing, scoped context building, provider execution, validation, retry logic, logging, and final status.
Uses a typed role_provider contract with opt-in OpenAI execution and a fake provider for deterministic tests and dry runs.
Accepts only unified diff or structured JSON patch output, validates touched files, rejects forbidden paths or deletes, and runs git apply --check.
Runs ruff, mypy, pytest, and patch-safety checks through deterministic subprocess wrappers.
Creates checkpoints, applies safe patches, rolls back tracked changes on failure, removes unsafe new files, and preserves bounded failure recovery.
Estimates tokens and cost before calls, tracks spend in SQLite, and blocks work that would exceed task, role, runtime, context, output, patch, or project budgets.
Combines capability-role intent, configuration-driven pricing, reasoning effort, and service-tier policy to choose the correct builder or reviewer role.
Writes JSON state, JSONL logs, a local SQLite budget ledger, convergence reports, escalation packets, and operator-facing run history.
Model Roles
The active capability set includes mechanical_worker,
classification_worker, simple_builder,
general_builder, coding_optimized_builder,
frontier_reviewer, and emergency_pro_reviewer. Routing is
based on bounded task intent, validation needs, escalation depth, and manual approval
policy where required.
mechanical_worker, simple_builder, general_builder, and coding_optimized_builder cover deterministic edits, small local changes, guided repair, and coding-heavy integration work.
classification_worker supports classification and summarization, while frontier_reviewer and emergency_pro_reviewer stay diagnosis-first by default rather than patching freely.
Cost Controls
Input tokens are estimated from built context, and projected output cost is evaluated before a provider call is allowed.
Budgets are tracked by project, task, and capability role with explicit runtime and output caps.
SQLite records provider, model, service tier, role, estimated tokens, projected cost, status, task, and timestamp.
Calls are blocked before execution if projected spend would exceed task, role, runtime, context, output, or project limits.
budget:
project_total_budget_usd: 5.0
role_budgets_usd:
mechanical_worker: 2.0
general_builder: 2.0
frontier_reviewer: 1.0
limits:
max_input_tokens_per_call: 32000
max_output_tokens_per_call: 4000
max_context_chars_per_call: 128000
max_patch_bytes: 256000
Service Tier Policy
Non-urgent builder work prefers flex, falls back to default,
and keeps priority disabled by default. Reasoning effort is a
request-level control, and pricing is configuration-driven rather than hardcoded into
runtime logic.
policy:
builder_service_tier_preference:
preferred: flex
fallback: default
allow_priority: false
reasoning_effort:
default: medium
Convergence Logic
Fewer validator failures, narrower diffs, or clearer progress toward the requested outcome.
Repeated identical failures or minimal improvement across attempts despite valid bounded retries.
Growing diffs, new failure categories, unrelated files, or regressions that indicate the route should stop.
Non-converging work creates structured escalation packets for diagnosis-first reviewer analysis.
Operational Safety
The orchestrator allows only deterministic subprocess tools through explicit wrappers with allowlists and timeouts. Model output is treated as data, never as executable shell. Patches must pass contract parsing, boundary checks, validators, and rollback rules instead of relying on model confidence.
git apply --check before patch applicationOperator Modes
The fake provider keeps the system testable without network calls, while the OpenAI
provider enables real runtime execution only when configuration opts it in and the
operator supplies OPENAI_API_KEY. This preserves a deterministic local
test path without weakening the actual provider-backed runtime.
Phase 3 Cleanup
The MVP is operational, but the Phase 3 cleanup is still in progress. The current work is focused on final OpenAI multimodel refactor cleanup, OpenCode runtime removal from the active path, backend normalization, naming cleanup, docs consistency, validation, and clean commit preparation.
Quickstart Summary
Use the fake provider path for local no-network validation and safety testing.
Enable the OpenAI provider explicitly for real smoke, integration, or normal runs with OPENAI_API_KEY configured.
Normal runs stay bounded by allowed files, budgets, timeouts, patch limits, and validator commands.
Validation and rollback remain the final authority over whether a patch is accepted.
Project Scope
A local CLI orchestration layer for controlled AI-assisted coding, not a web app or unrestricted autonomous shell agent.
OpenAI support is real but explicitly gated, while the fake provider handles deterministic dry runs and tests.
Provider usage accounting is policy-based and recorded in a SQLite ledger rather than reconciled to external billing APIs.
classification_worker is supported canonically, but router auto-selection remains intentionally conservative.
Human review remains the final trust boundary before shipping generated patches.
OpenCode was part of the earlier prototype, but the active runtime path now uses direct provider abstractions.
Technologies