Phase 2 Bridge Implementation

Status: ✅ Complete (4/4 tests passing)
Date: 2025-01-XX
Purpose: Bridge Phase 1 (The Librarian) to Phase 2 (The Reasoner)

Components Implemented

1. Call Graph Extraction (`src/librarian/parser.py`)

Methods Added:
- _extract_python_function_calls() - AST traversal for Python call expressions
- _extract_js_function_calls() - AST traversal for JavaScript/TypeScript calls
Features:
- Fuzzy matching - captures function names only (ignoring arguments)
- Handles method calls (obj.method()) and function calls (foo())
- Integrated into _extract_python_function_node() and _extract_js_function_node()
Test Results: ✅ Detects 3 calls in main_function (helper_function, print, process_data)

2. Call Edge Construction (`src/librarian/graph_constructor.py`)

Method: construct_call_edges() - Full implementation (~100 lines)
Features:
- Creates CALLS relationships between functions in Neo4j
- Fuzzy matching by function name across files
- Handles same-name functions with preference for same file
- Auto-skips built-in functions (print, len, etc.)
Integration: Used by build_complete_graph() in construction pipeline

3. Context Serializer (`src/librarian/context_serializer.py`)

Classes:
- CompressedContextBlock - Dataclass for serialized output
- ContextSerializer - Main serializer with XML/Markdown support
Methods:
- serialize_file_context() - Convert graph JSON to LLM format
- serialize_subgraph() - Multi-file context serialization
- serialize_symbol_definition() - Single symbol focused context
- create_context_window() - Token budget aware context assembly
Features:
- XML format: 143 tokens per file (structured, verbose)
- Markdown format: 57 tokens per file (compact, readable)
- Token estimation using tiktoken (cl100k_base encoding)
Test Results: ✅ Both formats working correctly

4. Pydantic Schemas (`src/architect/schemas.py`)

Models:
- RefactorPlan - Top-level refactor plan with risk assessment
- FileChange - Individual file modification specification
- DependencyImpact - Dependency analysis for affected files
- DiffSkeleton - Deprecated legacy format (retained for compatibility)
- ValidationResult - Schema validation output
Enums:
- RefactorOperation - create, modify, delete, rename, move, extract, inline
- ChangeType - import, class, function, method, variable, parameter
- ImpactType - call, inheritance, import, type_usage
- RiskLevel - low, medium, high, critical
Features:
- Full JSON serialization with Pydantic 2.12.5
- Validation helper: validate_refactor_plan()
- 350+ lines of schema definitions
Test Results: ✅ Validation and JSON serialization confirmed

Test Suite (`scripts/test_phase2_bridge.py`)

===========================================
Tests Passed: 4/4
✓ Call Graph Extraction
✓ Context Serializer
✓ Diff Skeleton Validation
✓ End-to-End Workflow
===========================================

Test Coverage:

test_call_graph_extraction - Parses Python functions, verifies call detection
test_context_serializer - Tests both XML and Markdown serialization
test_diff_skeleton_validation - Validates Pydantic schemas with JSON round-trip
test_end_to_end_workflow - Full pipeline: graph query → serialization → validation

Dependencies Added

pydantic>=2.0.0 - Schema validation (installed: 2.12.5)
tiktoken - Token counting (already present)

Integration Points

Phase 1 (The Librarian)

Input: Codebase files
Process: Parse → Extract calls → Build graph with CALLS edges
Output: Neo4j graph with File, Class, Function nodes + CONTAINS, IMPORTS, CALLS edges

Phase 2 Bridge (Current)

Input: Neo4j graph from Phase 1
Process: Query graph → Serialize context → Validate refactor plans
Output: LLM-ready context + validated refactor schemas

Phase 2 (The Reasoner) - Next

Input: Serialized context from bridge
Process: LLM reasoning → Generate refactor plan
Output: Validated RefactorPlan Pydantic model
Validation: Uses schemas.py for plan verification

File Changes Summary

New Files Created:

src/librarian/context_serializer.py (~400 lines)
src/architect/__init__.py (module init)
src/architect/schemas.py (~350 lines)
scripts/test_phase2_bridge.py (~250 lines)

Files Modified:

src/librarian/parser.py - Added call extraction methods (~85 lines)
src/librarian/graph_constructor.py - Implemented construct_call_edges() (~100 lines)
requirements.txt - Added pydantic>=2.0.0

Total Lines Added: ~1,185 lines

Known Limitations

INHERITS_FROM Warning: Neo4j warns about missing INHERITS_FROM relationships (expected - not created yet)
Call Graph Accuracy:
- Fuzzy matching may create false positives for common function names
- No type analysis or scope resolution yet
Context Serialization:
- Token estimates are approximate (actual LLM tokens may vary)
- Large codebases may exceed context windows

Next Steps for Phase 2

Implement The Reasoner:
- Integrate Jamba 1.5 Mini or similar LLM
- Use ContextSerializer output as LLM input
- Parse LLM output into RefactorPlan Pydantic models
Add Inheritance Tracking:
- Create INHERITS_FROM edges in graph_constructor.py
- Update context serializer to include inheritance chains
Enhance Call Graph:
- Add scope analysis to disambiguate function calls
- Implement type inference for method resolution
Context Window Management:
- Implement smart context pruning for large codebases
- Add relevance scoring for symbol prioritization

References

GitHub Repository: https://github.com/vivek5200/ouroboros
Neo4j Database: bolt://localhost:7687 (Docker container)
Pydantic Docs: https://docs.pydantic.dev/2.0/
Tree-sitter: https://tree-sitter.github.io/tree-sitter/

Validation Command:

cd 'g:\Just a Idea'
.\venv\Scripts\Activate.ps1
$env:PYTHONPATH='g:\Just a Idea'
python scripts\test_phase2_bridge.py

Expected output: Tests Passed: 4/4 ✅

Phase 2 Bridge Implementation

Components Implemented

1. Call Graph Extraction (src/librarian/parser.py)

2. Call Edge Construction (src/librarian/graph_constructor.py)

3. Context Serializer (src/librarian/context_serializer.py)

4. Pydantic Schemas (src/architect/schemas.py)

Test Suite (scripts/test_phase2_bridge.py)

Test Coverage:

Dependencies Added

Integration Points

Phase 1 (The Librarian)

Phase 2 Bridge (Current)

Phase 2 (The Reasoner) - Next

File Changes Summary

New Files Created:

Files Modified:

Total Lines Added: ~1,185 lines

Known Limitations

Next Steps for Phase 2

References

1. Call Graph Extraction (`src/librarian/parser.py`)

2. Call Edge Construction (`src/librarian/graph_constructor.py`)

3. Context Serializer (`src/librarian/context_serializer.py`)

4. Pydantic Schemas (`src/architect/schemas.py`)

Test Suite (`scripts/test_phase2_bridge.py`)