Phase 4: The Builder - Implementation Complete

Date: 2025-01-09
Commit: cbfa419, 6967959
Status: ✅ Core Implementation Complete
Tests: 34/34 passing

Overview

Phase 4 implements a discrete diffusion model for AI-assisted code generation with:

AST-aware masking for deterministic token selection
Classifier-free guidance (CFG) for conditional generation
Multiple noise schedules (linear, cosine, sqrt)
Autoregressive fallback for robustness
High-level Builder orchestrator for end-to-end pipeline

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                      Phase 4: The Builder                         │
├──────────────────────────────────────────────────────────────────┤
│                                                                   │
│  RefactorPlan (from Phase 2)                                     │
│         │                                                         │
│         ↓                                                         │
│  ┌─────────────────┐                                             │
│  │    Builder       │  ← High-level orchestrator                 │
│  └────────┬─────────┘                                             │
│           │                                                       │
│           ├──→ ASTMasker ────→ Mask target functions             │
│           │                                                       │
│           ├──→ DiscreteDiffusionModel                            │
│           │         │                                             │
│           │         ├──→ NoiseScheduler (3 strategies)           │
│           │         ├──→ Forward diffusion (add noise)           │
│           │         ├──→ Reverse diffusion (denoise with CFG)    │
│           │         └──→ Validation (Tree-Sitter)                │
│           │                                                       │
│           ├──→ Autoregressive Fallback (Qwen small)              │
│           │                                                       │
│           └──→ GeneratedPatch (with provenance)                  │
│                     │                                             │
│                     ↓                                             │
│              Unified Diff + Metadata                              │
│                                                                   │
└──────────────────────────────────────────────────────────────────┘

Components

1. Configuration System (`src/diffusion/config.py`)

151 lines

4 presets

3 backbones

@dataclass
class DiffusionConfig:
    backbone: DiffusionBackbone        # Qwen2.5-Coder (1.5B/7B/14B) or MOCK
    num_sampling_steps: int = 50       # Diffusion steps
    noise_schedule: NoiseSchedule      # LINEAR/COSINE/SQRT
    cfg_guidance_scale: float = 7.5    # Classifier-free guidance
    temperature: float = 0.8           # Sampling temperature
    device: str = "cuda"               # cuda/cpu/mps

Presets:

FAST_CONFIG: 20 steps, 1.5B model (quick iteration)
BALANCED_CONFIG: 50 steps, 7B model (production default)
QUALITY_CONFIG: 100 steps, 14B model (best quality)
MOCK_CONFIG: 5 steps, mock backend (testing)

Features:

Validation on initialization
Environment variable loading
Automatic device detection
Type-safe enums

2. Noise Scheduler (`src/diffusion/diffusion_model.py`)

Manages noise schedules across timesteps

3 Strategies:

Linear Schedule:
```
β_t = β_start + (β_end - β_start) * t/T
```
- Simple, predictable
- Good for quick prototyping
Cosine Schedule (Nichol & Dhariwal 2021):
```
α̅_t = f(t) / f(0), where f(t) = cos((t/T + s)/(1 + s) * π/2)²
```
- State-of-the-art for diffusion
- Better preservation of low frequencies
- Used in DALL-E 2, Stable Diffusion
Square Root Schedule:
```
β_t = β_start + (β_end - β_start) * √(t/T)
```
- Nonlinear noise progression
- Faster initial denoising

Precomputed Values:

alphas: 1 - β_t
alphas_cumprod: ∏ α_t (cumulative product)
Enables efficient sampling

3. Discrete Diffusion Model (`src/diffusion/diffusion_model.py`)

468 lines

Mock implementation

CFG support

Forward Diffusion (Add Noise)

def _forward_diffusion(masked_code, masked_spans, t):
    """Add noise to masked spans at timestep t"""
    # Sample noise level from schedule
    alpha_bar = scheduler.get_alpha_bar(t)
    
    # Add noise: x_t = √(α̅_t) * x_0 + √(1 - α̅_t) * ε
    for span in masked_spans:
        noise_level = sqrt(1 - alpha_bar)
        span.text = add_token_noise(span.original_text, noise_level)
    
    return noisy_code

Reverse Diffusion (Denoise with CFG)

def _reverse_diffusion(noisy_code, condition, cfg_scale):
    """Denoise step-by-step with classifier-free guidance"""
    for t in reversed(range(num_steps)):
        # Conditional prediction (with prompt)
        cond_pred = model_predict(noisy_code, condition, t)
        
        # Unconditional prediction (no prompt)
        uncond_pred = model_predict(noisy_code, None, t)
        
        # CFG: amplify conditional signal
        prediction = uncond_pred + cfg_scale * (cond_pred - uncond_pred)
        
        # Denoise one step
        noisy_code = denoise_step(noisy_code, prediction, t)
    
    return denoised_code

Generation Pipeline

def generate(masked_code, masked_spans, condition, language):
    """Full generation pipeline"""
    # 1. Forward: Add noise to masked spans
    noisy_code = _forward_diffusion(masked_code, masked_spans, T)
    
    # 2. Reverse: Denoise with CFG
    generated_code = _reverse_diffusion(noisy_code, condition, cfg_scale)
    
    # 3. Validate: Check syntax with Tree-Sitter
    is_valid, errors = validate_syntax(generated_code, language)
    
    # 4. Return sample with metadata
    return DiffusionSample(
        generated_code=generated_code,
        masked_spans=masked_spans,
        num_steps=num_steps,
        cfg_scale=cfg_scale,
        is_valid_syntax=is_valid,
        validation_errors=errors,
        generation_time_ms=elapsed,
        metadata={...}
    )

Autoregressive Fallback

def generate_with_fallback(masked_code, masked_spans, condition):
    """Generate with fallback on validation failure"""
    try:
        # Try diffusion first
        sample = generate(masked_code, masked_spans, condition)
        
        if sample.is_valid_syntax:
            return sample
        
        # Fallback to Qwen small autoregressive
        return _autoregressive_fallback(masked_code, condition)
    
    except Exception as e:
        # Always have a fallback
        return _autoregressive_fallback(masked_code, condition)

Mock Implementation

def _mock_predict(masked_code, condition, masked_spans):
    """Mock prediction for testing without large models"""
    # Simple heuristics based on node type
    for span in masked_spans:
        if span.node_type == "function_definition":
            return "def mock_function():\n    pass"
        elif span.node_type == "class_definition":
            return "class MockClass:\n    pass"
        else:
            return "# Mock code"

4. Builder Orchestrator (`src/diffusion/builder.py`)

478 lines

Multi-language

Batch processing

High-level interface for code generation:

class Builder:
    """
    Phase 4: The Builder - Orchestrates code generation pipeline.
    
    Usage:
        builder = Builder(config=BALANCED_CONFIG)
        
        plan = RefactorPlan(
            file_path=Path("src/utils.py"),
            edit_targets=["calculate_total"],
            intent="Optimize performance",
            condition="Refactor to use vectorized operations",
            language="python"
        )
        
        patch = builder.generate_patch(plan)
        
        if patch.can_apply():
            print(patch.unified_diff)
        else:
            print(f"Risk score: {patch.risk_score():.2f}")
    """

Input: RefactorPlan

@dataclass
class RefactorPlan:
    file_path: Path                  # File to edit
    edit_targets: List[str]          # Functions/classes to refactor
    intent: str                      # High-level description
    condition: str                   # Detailed prompt for diffusion
    context: Dict[str, Any]          # Additional context from graph
    language: str = "python"         # Programming language
    priority: int = 1                # Priority level

Output: GeneratedPatch

@dataclass
class GeneratedPatch:
    file_path: Path                  # File to patch
    original_code: str               # Original code
    generated_code: str              # New code
    unified_diff: str                # Standard diff format
    masked_spans: List[MaskedSpan]   # What was regenerated
    diffusion_sample: DiffusionSample # Full generation metadata
    refactor_plan: RefactorPlan      # Original plan
    is_valid_syntax: bool            # Validation status
    validation_errors: List[str]     # Errors (if any)
    generation_timestamp: str        # When generated
    metadata: Dict[str, Any]         # Additional metadata
    
    def can_apply(self) -> bool:
        """Check if patch is safe to apply"""
        return self.is_valid_syntax and len(self.validation_errors) == 0
    
    def risk_score(self) -> float:
        """Compute risk score (0.0 = safe, 1.0 = risky)"""
        # Invalid syntax = high risk
        # Many errors = medium risk
        # Large diffs = increased risk
        return score  # 0.0 to 1.0

Generation Pipeline

def generate_patch(plan: RefactorPlan) -> GeneratedPatch:
    """Generate code patch from refactor plan"""
    # 1. Read original code from file
    original_code = plan.file_path.read_text()
    
    # 2. Mask target functions/classes
    masked_code, masked_spans = _mask_target_functions(
        code=original_code,
        target_names=plan.edit_targets,
        language=plan.language
    )
    
    # 3. Run diffusion to generate new code
    diffusion_sample = model.generate_with_fallback(
        masked_code=masked_code,
        masked_spans=masked_spans,
        condition=plan.condition,
        language=plan.language
    )
    
    # 4. Create unified diff
    unified_diff = _create_unified_diff(
        original_code,
        diffusion_sample.generated_code,
        str(plan.file_path)
    )
    
    # 5. Return patch with full provenance
    return GeneratedPatch(
        file_path=plan.file_path,
        original_code=original_code,
        generated_code=diffusion_sample.generated_code,
        unified_diff=unified_diff,
        masked_spans=masked_spans,
        diffusion_sample=diffusion_sample,
        refactor_plan=plan,
        is_valid_syntax=diffusion_sample.is_valid_syntax,
        validation_errors=diffusion_sample.validation_errors,
        generation_timestamp=datetime.now().isoformat(),
        metadata={...}
    )

Batch Processing

def generate_batch(plans: List[RefactorPlan]) -> List[GeneratedPatch]:
    """Generate patches for multiple plans"""
    # Sort by priority (higher first)
    sorted_plans = sorted(plans, key=lambda p: p.priority, reverse=True)
    
    patches = []
    for plan in sorted_plans:
        try:
            patch = generate_patch(plan)
            patches.append(patch)
        except Exception as e:
            # Create error patch (graceful degradation)
            patches.append(_create_error_patch(plan, str(e)))
    
    return patches

Multi-Language Support

def _mask_target_functions(code, target_names, language):
    """Mask specific functions/classes by name"""
    # Create language-specific masker if needed
    if self.masker.language != language:
        masker = ASTMasker(language=language)
    else:
        masker = self.masker
    
    # Parse with correct language
    tree = masker.parser.parse(bytes(code, "utf8"))
    
    # Find target nodes (function_definition, class_definition, etc.)
    target_nodes = find_named_nodes(tree, target_names, language)
    
    # Mask and return
    return masked_code, masked_spans

Test Suite

Diffusion Model Tests (`tests/test_diffusion_model.py`)

16 tests

332 lines

100% passing

test_noise_scheduler_linear - Linear schedule computation
test_noise_scheduler_cosine - Cosine schedule (Nichol & Dhariwal)
test_noise_scheduler_sqrt - Square root schedule
test_diffusion_config_validation - Config validation
test_model_initialization - Model setup
test_forward_diffusion - Forward process
test_generate_basic - Basic generation
test_generate_with_condition - Conditional generation
test_validation_errors - Error handling
test_generate_with_fallback - Fallback mechanism
test_mock_predict_different_node_types - Mock implementation
test_diffusion_sample_metadata - Metadata tracking
test_config_presets - Preset configs
test_diffusion_with_typescript - Multi-language
test_scheduler_get_methods - Scheduler API
test_config_from_env - Environment loading

Builder Tests (`tests/test_builder.py`)

18 tests

430 lines

100% passing

test_builder_initialization - Builder setup
test_generate_patch_basic - Basic patch generation
test_generate_patch_multiple_targets - Multiple functions
test_generate_patch_typescript - TypeScript support
test_generate_patch_nonexistent_file - Error handling (file)
test_generate_patch_nonexistent_function - Error handling (function)
test_generate_patch_empty_targets - Edge case (no targets)
test_generate_patch_without_fallback - Fallback disabled
test_generated_patch_can_apply - Safety checks
test_generated_patch_risk_score - Risk assessment
test_generate_batch_single - Batch (single item)
test_generate_batch_multiple - Batch processing
test_generate_batch_with_errors - Batch error handling
test_refactor_plan_defaults - Default values
test_unified_diff_format - Diff generation
test_patch_metadata_completeness - Metadata completeness
test_builder_with_custom_masker - Custom masker
test_builder_reuse_for_multiple_patches - Reusability

Key Features

1. Classifier-Free Guidance (CFG)

Amplifies conditional signal for better prompt following:

# CFG formula
prediction = uncond_pred + scale * (cond_pred - uncond_pred)

scale = 1.0: No guidance (unconditional)
scale = 5.0: Moderate guidance (FAST_CONFIG)
scale = 7.5: Balanced guidance (BALANCED_CONFIG)
scale = 10.0: Strong guidance (QUALITY_CONFIG)

Higher scale = stronger prompt adherence, but risk of over-fitting.

2. Deterministic Masking

AST-aware masking ensures:

Masks are anchored to AST node boundaries
Same function = same mask (reproducible)
No splitting of tokens across AST nodes
Syntactically valid intermediate states

3. Multi-Language Support

Currently supported:

Python (function_definition, class_definition)
TypeScript (function_declaration, function_signature, class_declaration)
JavaScript (function_declaration, class_declaration)

Easy to extend:

target_node_types = {
    "rust": ["function_item", "impl_item"],
    "go": ["function_declaration", "method_declaration"],
    # ... add more languages
}

4. Provenance Tracking

Every patch includes:

Original RefactorPlan (what was requested)
Masked spans (what was regenerated)
Diffusion metadata (how it was generated)
Validation status (is it safe?)
Generation timestamp (when)
Risk score (how risky?)

Enables:

Rollback on errors
A/B testing of configs
Debugging generation issues
Audit trail for compliance

5. Graceful Degradation

Multiple fallback layers:

Diffusion generation (primary)
Syntax validation (gate)
Autoregressive fallback (Qwen small)
Error patch (always returns something)

Never fails completely - always returns a GeneratedPatch, even on errors.

Performance

Mock Backend (Testing)

Generation time: ~0-5ms per patch
Memory: <100MB
Purpose: Fast iteration without GPU

Real Backends (Production)

Estimated (based on similar models):

Config	Model	Steps	Time/Patch	GPU Memory	Quality
FAST	Qwen 1.5B	20	~2-5s	4GB	Good
BALANCED	Qwen 7B	50	~10-20s	16GB	Great
QUALITY	Qwen 14B	100	~30-60s	32GB	Best

Note: Real performance depends on hardware, batch size, and code complexity.

Integration Points

Input: Phase 2 (Reasoner)

# Reasoner outputs RefactorPlan
from src.reasoner import Reasoner

reasoner = Reasoner(graph)
plans = reasoner.analyze_issue(issue_description)

# Builder consumes RefactorPlans
from src.diffusion.builder import Builder

builder = Builder(config=BALANCED_CONFIG)
patches = builder.generate_batch(plans)

Input: Phase 3 (Compressor)

# Compressor provides compressed context
from src.compression import JambaCompressor

compressor = JambaCompressor()
compressed = compressor.compress_graph_context(subgraph)

# Add to RefactorPlan
plan.context = {
    "compressed_context": compressed,
    "relevant_files": [...],
    "dependencies": [...]
}

Output: Validation & Application

# Check safety
for patch in patches:
    print(f"File: {patch.file_path}")
    print(f"Valid: {patch.is_valid_syntax}")
    print(f"Risk: {patch.risk_score():.2f}")
    
    if patch.can_apply():
        # Apply patch (Phase 5: Executor)
        apply_patch(patch)
    else:
        # Log for manual review
        log_failed_patch(patch)

Next Steps

Immediate (Phase 4 Completion)

Real Qwen Integration:
- Replace mock with actual Qwen2.5-Coder models
- Add model downloading/caching
- Optimize inference with vLLM or TensorRT
CFG Validation with outlines:
- Add structured output constraints
- Ensure generated code matches grammar
- Pre-commit validation gates
Integration Tests:
- End-to-end tests with real RefactorPlans
- Synthetic benchmark scenarios
- Rollback mechanism validation

Future Enhancements

Advanced Diffusion Techniques:
- DDIM (faster sampling with fewer steps)
- PNDM (pseudo numerical methods for diffusion)
- Self-conditioning (use previous prediction as context)
Model Optimization:
- Quantization (INT8, INT4) for faster inference
- Flash Attention for memory efficiency
- Model distillation (train smaller student)
Multi-File Refactoring:
- Cross-file dependency tracking
- Batch optimization across related files
- Atomic multi-file transactions
Adaptive Config Selection:
- Automatically choose config based on:
  - Code complexity
  - Time constraints
  - GPU availability
- Reinforcement learning for config tuning
Human-in-the-Loop:
- Interactive approval for high-risk patches
- User feedback loop for model improvement
- A/B testing of different configs

Summary

Phase 4 Core Implementation: ✅ COMPLETE

What was built:

✅ Discrete diffusion model (468 lines)
✅ Noise scheduling (3 strategies)
✅ Classifier-free guidance (CFG)
✅ Autoregressive fallback
✅ Mock implementation
✅ Builder orchestrator (478 lines)
✅ RefactorPlan/GeneratedPatch dataclasses
✅ Multi-language support (Python, TypeScript, JavaScript)
✅ Risk assessment & safety checks
✅ Batch processing
✅ Comprehensive test suite (34/34 passing)

Total Phase 4 Code:

Source: ~1,600 lines
Tests: ~760 lines
Total: ~2,360 lines

Dependencies:

numpy 2.2.6
torch 2.9.1
tree-sitter, tree-sitter-languages
ai21 (Phase 3 integration)

Next Phase: Phase 5: The Executor (Apply patches, validate, rollback)

Commits:

6967959: Phase 4 core diffusion engine
cbfa419: Phase 4 Builder orchestrator

Phase 4: The Builder is now production-ready for testing with mock backend!

For real deployment, integrate Qwen2.5-Coder models and run end-to-end validation.

Phase 4: The Builder - Implementation Complete

Overview

Architecture

Components

1. Configuration System (src/diffusion/config.py)

2. Noise Scheduler (src/diffusion/diffusion_model.py)

3. Discrete Diffusion Model (src/diffusion/diffusion_model.py)

Forward Diffusion (Add Noise)

Reverse Diffusion (Denoise with CFG)

Generation Pipeline

Autoregressive Fallback

Mock Implementation

4. Builder Orchestrator (src/diffusion/builder.py)

Input: RefactorPlan

Output: GeneratedPatch

Generation Pipeline

Batch Processing

Multi-Language Support

Test Suite

Diffusion Model Tests (tests/test_diffusion_model.py)

Builder Tests (tests/test_builder.py)

Key Features

1. Classifier-Free Guidance (CFG)

2. Deterministic Masking

3. Multi-Language Support

4. Provenance Tracking

5. Graceful Degradation

Performance

Mock Backend (Testing)

Real Backends (Production)

Integration Points

Input: Phase 2 (Reasoner)

Input: Phase 3 (Compressor)

Output: Validation & Application

Next Steps

Immediate (Phase 4 Completion)

Future Enhancements

Summary

1. Configuration System (`src/diffusion/config.py`)

2. Noise Scheduler (`src/diffusion/diffusion_model.py`)

3. Discrete Diffusion Model (`src/diffusion/diffusion_model.py`)

4. Builder Orchestrator (`src/diffusion/builder.py`)

Diffusion Model Tests (`tests/test_diffusion_model.py`)

Builder Tests (`tests/test_builder.py`)