Technical Whitepaper

Mahalaxmi Terminal Orchestration

Multi-Agent AI Development at Scale

Version 1.0 | March 2026 | ThriveTech Services LLC

Abstract

Modern software development increasingly relies on AI coding assistants, yet these tools are almost universally designed for one-agent, one-developer interaction. Mahalaxmi Terminal Orchestration breaks this constraint by running dozens of AI coding agents in parallel — each with isolated workspaces, intelligent task assignment, and full PTY-native terminal control — coordinated by a Manager-Worker directed acyclic graph (DAG) architecture. This paper describes the architectural principles, consensus mechanisms, context routing strategies, and enterprise governance features that make large-scale multi-agent development practical on a standard developer workstation.

Observed Performance

Task Type	Sequential	Parallel (8 workers)	Speedup
6-task REST API feature	~18 min	~4 min	4.5×
15-task test coverage	~55 min	~12 min	4.6×
22-task microservice	~90 min	~21 min	4.3×

Speedup is sub-linear due to dependency-ordered phases and merge overhead. Consistently 4–5× across project types.

1. Introduction

The AI coding assistant market has matured rapidly. Tools like Claude Code, GitHub Copilot, OpenAI Codex, and their open-source counterparts have demonstrated that LLMs can produce high-quality code given appropriate context and direction. Individual developers report 2–5× productivity gains from these tools. The bottleneck is no longer whether AI can write code — it is whether a single AI agent can handle the full scope of a complex software project without becoming a single point of serialization. A real codebase has hundreds of independent modules, services, and components that could be developed in parallel. Yet every AI coding assistant available today works in exactly one conversation at a time, with one developer watching one terminal. Mahalaxmi is the orchestration layer that changes this. Rather than one AI agent working sequentially through a task list, Mahalaxmi deploys a team of AI agents organized into a Manager-Worker hierarchy. Manager agents analyze the codebase, deliberate on an execution plan, and produce a dependency-ordered task graph. Worker agents execute tasks in parallel, each isolated in its own Git worktree, with context pre-filtered to only the code they need. The results are automatically integrated via pull requests, tested through a post-cycle validation pipeline, and surfaced through a real-time desktop UI.

2. Problem Statement

2.1 The Sequential AI Development Bottleneck

Current AI coding tools suffer from three fundamental constraints: Single-agent serialization. One conversation produces one stream of work. If a task has 20 independent subtasks, they execute one at a time. The parallelism inherent in the problem structure is discarded. Context window saturation. Coding agents degrade in quality as conversation context grows. Large codebases exhaust context windows, causing agents to "forget" earlier constraints or produce code that conflicts with work already done in the same session. Tool lock-in. Most enterprise environments already have contracts with multiple AI providers (Anthropic, OpenAI, AWS Bedrock, Google Vertex). Existing orchestration tools typically target a single provider, leaving significant purchased capacity idle.

2.2 The Human-in-the-Loop Gap

Teams using AI coding tools today operate them manually: a developer types a prompt, reviews the output, applies a change, and repeats. This keeps development fully serialized at the human level. Automating the orchestration layer — while preserving human oversight at key decision points — is the unsolved problem.

2.3 Enterprise Governance Challenges

As AI coding adoption grows, organizations face new governance requirements: tracking AI-generated code for compliance and audit purposes, controlling per-developer and per-project AI spend, enforcing security policies (no secrets in output, dependency audits), and managing multi-seat licensing for team deployments. No existing tool addresses all of these at the infrastructure level.

3. Architecture Overview

Mahalaxmi is a cross-platform desktop application built on Tauri 2.x (Rust + WebView), providing a native application experience on Windows, macOS, and Linux without requiring a cloud backend.

Crate	Responsibility
`mahalaxmi-core`	Shared domain types, i18n (10 locales), config, error types, logging
`mahalaxmi-pty`	PTY spawning, stream I/O, VT100/ANSI parsing, event emission
`mahalaxmi-providers`	AiProvider trait + implementations for 8+ AI coding tools
`mahalaxmi-orchestration`	Cycle state machine, consensus engine, worker queue, DAG scheduling, Git worktree management
`mahalaxmi-detection`	State detection rules, pattern matching, auto-response for interactive prompts

4. The Consensus Engine

The Consensus Engine is the component responsible for merging multiple manager agents' execution plans into a single, coherent, deduplicated task graph. When the Manager Phase begins, N manager agents (configurable, 1–8) independently analyze the codebase and requirements document. Each manager produces a proposed execution plan — a set of tasks with names, descriptions, target files, complexity estimates, and dependencies. The Consensus Engine then merges these proposals using one of four strategies: Union: includes all unique tasks from all managers. Semantic deduplication identifies tasks that describe the same work with different words and merges them. Best for maximizing coverage. Intersection: includes only tasks that appeared in every manager's plan. Produces a minimal, high-confidence plan. Best for conservative or safety-critical work. WeightedVoting: tasks weighted by the historical performance reputation of the provider that proposed them. Best when provider quality varies significantly. ComplexityWeighted: tasks weighted by their complexity scores. Ensures high-complexity tasks from any manager are included. Best for complex projects where missing a hard task is costly. The deduplication pipeline is CamelCase-aware — it tokenizes task names into constituent words before computing similarity. Multi-field Jaccard similarity is computed across the task name, description, and file scope. Tasks in the ambiguous zone are sent to LLM arbitration: a prompt asks the model to determine whether two candidate tasks should be merged, kept separate, or synthesized into a new task that captures the intent of both.

5. PTY-Native Terminal Control

Mahalaxmi controls AI coding tools by taking ownership of their terminal (PTY — pseudoterminal). This is architecturally distinct from screen capture or clipboard injection approaches. When Mahalaxmi spawns an AI coding tool, it spawns the process with a PTY attached. Mahalaxmi's PTY engine (mahalaxmi-pty) reads raw bytes from the PTY output stream and writes commands to the PTY input stream. The byte stream is parsed using a VT100/ANSI state machine to extract semantic content — text, cursor positions, colors — without rendering the content to a screen. This architecture provides several properties not achievable with screen capture: Reliability. The byte stream is deterministic for a given tool version. OCR errors do not occur because there is no image processing. Universality. Any terminal-based AI CLI tool can be controlled — the approach is not specific to a font, color scheme, or terminal emulator. Latency. Reading raw bytes from a PTY is faster than screen capture pipelines, which typically require frame capture, image decoding, and OCR. Completeness. The full output, including color codes and terminal control sequences, is available for analysis.

6. Intelligent Context Routing

Workers do not receive the full codebase as context. The full codebase context problem is one of the fundamental failure modes of single-agent AI coding: as context grows, quality degrades. Mahalaxmi's context routing system selects a minimal, maximally relevant subset of files for each worker task. Files are scored using three signals: Lexical Jaccard (α): the Jaccard similarity between the vocabulary of the task description and the vocabulary of the file's content. Import-graph proximity (β): the BFS distance in the codebase's import dependency graph from the task's explicitly named target files to each candidate file. Files closer in the import graph to the task's targets score higher. Historical co-occurrence (γ): files that have been modified together in previous cycles involving similar tasks score higher. The combined score is: score(f) = α · lexical(f) + β · proximity(f) + γ · cooccurrence(f) Files are ranked by this score and added to the worker's context window until the configured token budget is exhausted. Workers receive exactly the files they need — not the whole repository.

7. Technology Stack

Core

Rust (portable-pty, tokio, rusqlite, tree-sitter, axum)

Desktop shell

Tauri 2.x (cross-platform, native webview, system keyring)

Frontend

Next.js + TypeScript

AI providers

Claude Code, OpenAI Foundry, AWS Bedrock, Google Gemini, Kiro, Goose, DeepSeek, Qwen Coder

IDE extensions

VS Code, JetBrains, Neovim, Visual Studio

Security & Privacy Architecture

All orchestration runs locally on the developer's machine. AI provider calls go directly from the developer's machine to the provider's endpoint — ThriveTech never proxies, relays, or receives AI prompts, completions, or code. License validation transmits only a machine fingerprint and license token.

Ready to run a team of AI agents on your codebase?

Download Free View all features