Phase 2 Bridge Implementation
Status: ✅ Complete (4/4 tests passing)
Date: 2025-01-XX
Purpose: Bridge Phase 1 (The Librarian) to Phase 2 (The Reasoner)
Components Implemented
1. Call Graph Extraction (src/librarian/parser.py)
- Methods Added:
_extract_python_function_calls()- AST traversal for Python call expressions_extract_js_function_calls()- AST traversal for JavaScript/TypeScript calls
- Features:
- Fuzzy matching - captures function names only (ignoring arguments)
- Handles method calls (
obj.method()) and function calls (foo()) - Integrated into
_extract_python_function_node()and_extract_js_function_node()
- Test Results: ✅ Detects 3 calls in
main_function(helper_function, print, process_data)
2. Call Edge Construction (src/librarian/graph_constructor.py)
- Method:
construct_call_edges()- Full implementation (~100 lines) - Features:
- Creates CALLS relationships between functions in Neo4j
- Fuzzy matching by function name across files
- Handles same-name functions with preference for same file
- Auto-skips built-in functions (print, len, etc.)
- Integration: Used by
build_complete_graph()in construction pipeline
3. Context Serializer (src/librarian/context_serializer.py)
- Classes:
CompressedContextBlock- Dataclass for serialized outputContextSerializer- Main serializer with XML/Markdown support
- Methods:
serialize_file_context()- Convert graph JSON to LLM formatserialize_subgraph()- Multi-file context serializationserialize_symbol_definition()- Single symbol focused contextcreate_context_window()- Token budget aware context assembly
- Features:
- XML format: 143 tokens per file (structured, verbose)
- Markdown format: 57 tokens per file (compact, readable)
- Token estimation using
tiktoken(cl100k_base encoding)
- Test Results: ✅ Both formats working correctly
4. Pydantic Schemas (src/architect/schemas.py)
- Models:
RefactorPlan- Top-level refactor plan with risk assessmentFileChange- Individual file modification specificationDependencyImpact- Dependency analysis for affected filesDiffSkeleton- Deprecated legacy format (retained for compatibility)ValidationResult- Schema validation output
- Enums:
RefactorOperation- create, modify, delete, rename, move, extract, inlineChangeType- import, class, function, method, variable, parameterImpactType- call, inheritance, import, type_usageRiskLevel- low, medium, high, critical
- Features:
- Full JSON serialization with Pydantic 2.12.5
- Validation helper:
validate_refactor_plan() - 350+ lines of schema definitions
- Test Results: ✅ Validation and JSON serialization confirmed
Test Suite (scripts/test_phase2_bridge.py)
===========================================
Tests Passed: 4/4
✓ Call Graph Extraction
✓ Context Serializer
✓ Diff Skeleton Validation
✓ End-to-End Workflow
===========================================
Test Coverage:
- test_call_graph_extraction - Parses Python functions, verifies call detection
- test_context_serializer - Tests both XML and Markdown serialization
- test_diff_skeleton_validation - Validates Pydantic schemas with JSON round-trip
- test_end_to_end_workflow - Full pipeline: graph query → serialization → validation
Dependencies Added
pydantic>=2.0.0- Schema validation (installed: 2.12.5)tiktoken- Token counting (already present)
Integration Points
Phase 1 (The Librarian)
- Input: Codebase files
- Process: Parse → Extract calls → Build graph with CALLS edges
- Output: Neo4j graph with File, Class, Function nodes + CONTAINS, IMPORTS, CALLS edges
Phase 2 Bridge (Current)
- Input: Neo4j graph from Phase 1
- Process: Query graph → Serialize context → Validate refactor plans
- Output: LLM-ready context + validated refactor schemas
Phase 2 (The Reasoner) - Next
- Input: Serialized context from bridge
- Process: LLM reasoning → Generate refactor plan
- Output: Validated
RefactorPlanPydantic model - Validation: Uses
schemas.pyfor plan verification
File Changes Summary
New Files Created:
src/librarian/context_serializer.py(~400 lines)src/architect/__init__.py(module init)src/architect/schemas.py(~350 lines)scripts/test_phase2_bridge.py(~250 lines)
Files Modified:
src/librarian/parser.py- Added call extraction methods (~85 lines)src/librarian/graph_constructor.py- Implementedconstruct_call_edges()(~100 lines)requirements.txt- Addedpydantic>=2.0.0
Total Lines Added: ~1,185 lines
Known Limitations
- INHERITS_FROM Warning: Neo4j warns about missing
INHERITS_FROMrelationships (expected - not created yet) - Call Graph Accuracy:
- Fuzzy matching may create false positives for common function names
- No type analysis or scope resolution yet
- Context Serialization:
- Token estimates are approximate (actual LLM tokens may vary)
- Large codebases may exceed context windows
Next Steps for Phase 2
- Implement The Reasoner:
- Integrate Jamba 1.5 Mini or similar LLM
- Use
ContextSerializeroutput as LLM input - Parse LLM output into
RefactorPlanPydantic models
- Add Inheritance Tracking:
- Create
INHERITS_FROMedges ingraph_constructor.py - Update context serializer to include inheritance chains
- Create
- Enhance Call Graph:
- Add scope analysis to disambiguate function calls
- Implement type inference for method resolution
- Context Window Management:
- Implement smart context pruning for large codebases
- Add relevance scoring for symbol prioritization
References
- GitHub Repository: https://github.com/vivek5200/ouroboros
- Neo4j Database: bolt://localhost:7687 (Docker container)
- Pydantic Docs: https://docs.pydantic.dev/2.0/
- Tree-sitter: https://tree-sitter.github.io/tree-sitter/
Validation Command:
cd 'g:\Just a Idea'
.\venv\Scripts\Activate.ps1
$env:PYTHONPATH='g:\Just a Idea'
python scripts\test_phase2_bridge.py
Expected output: Tests Passed: 4/4 ✅