AI21 Jamba Cloud Setup Guide
Why AI21 Cloud?
âś… Recommended for Production
- Reliable 99.9% uptime
- No local GPU required
- Automatic scaling
- 256k token context window
- Official support from AI21
Step 1: Get Your API Key
- Go to AI21 Studio
- Sign up / Log in
- Navigate to Account → API Key
- Click “Create New API Key”
- Copy your API key (starts with
AI21_...)
Free Tier:
- $10 free credits on signup
- ~200,000 tokens free
- Good for testing Phase 3
Pricing (after free tier):
- Jamba-1.5-Mini: $0.20 per 1M input tokens
- Jamba-1.5-Large: $2.00 per 1M input tokens
Step 2: Configure Ouroboros
Option A: Environment Variable (Recommended)
# Windows PowerShell
$env:AI21_API_KEY="your_api_key_here"
# Linux/Mac
export AI21_API_KEY="your_api_key_here"
Option B: .env File
# Copy the example file
cp .env.example .env
# Edit .env and add your key
AI21_API_KEY=your_actual_api_key_here
JAMBA_MODE=cloud
Step 3: Test the Connection
from src.context_encoder import ContextEncoder, ContextEncoderConfig
from src.context_encoder.config import EncoderProvider, JambaConfig
# Configure for AI21 Cloud
jamba_config = JambaConfig(
use_cloud=True,
cloud_api_key="your_api_key_here" # Or load from env
)
config = ContextEncoderConfig(
provider=EncoderProvider.JAMBA_CLOUD,
jamba=jamba_config
)
encoder = ContextEncoder(config)
# Test compression
sample_code = """
export class AuthService {
async login(email: string, password: string) {
// Authentication logic
}
}
"""
compressed = encoder.compress(
codebase_context=sample_code,
target_files=["auth.ts"]
)
print(f"âś… Compression successful!")
print(f"Input: {compressed.tokens_in} tokens")
print(f"Output: {compressed.tokens_out} tokens")
print(f"Ratio: {compressed.compression_ratio:.1f}x")
print(f"\nSummary:\n{compressed.summary}")
Step 4: Use with Reasoner (Phase 2 Integration)
from src.reasoner import Reasoner, ReasonerConfig
from src.reasoner.config import LLMProvider
# Initialize Reasoner
config = ReasonerConfig(provider=LLMProvider.GEMINI)
reasoner = Reasoner(config)
# Generate refactor plan with deep context
plan = reasoner.generate_refactor_plan(
task_description="Refactor authentication system",
target_file="src/auth/login.ts",
use_deep_context=True # 🔥 Uses AI21 Jamba for 256k context
)
print(f"Plan ID: {plan.plan_id}")
print(f"Impact: {plan.estimated_impact}")
Troubleshooting
Error: “AI21_API_KEY environment variable not set”
Solution: Set the environment variable or pass it explicitly:
from src.context_encoder.config import JambaConfig
jamba_config = JambaConfig(
use_cloud=True,
cloud_api_key="your_key_here"
)
Error: “Failed to initialize Jamba client”
Possible causes:
- Invalid API key
- No internet connection
- AI21 API is down (check status.ai21.com)
Solution: Verify your API key at studio.ai21.com
Error: Rate limit exceeded
Solution: You’ve used your free credits. Options:
- Add payment method to AI21 account
- Switch to local mode (free):
JAMBA_MODE=local - Use mock provider for testing:
EncoderProvider.MOCK
Slow response times
Normal behavior:
- First request: 10-30 seconds (cold start)
- Subsequent requests: 3-10 seconds
- Large context (100k+ tokens): 15-45 seconds
If consistently slow:
- Check your internet connection
- Try a smaller context first
- Consider using
max_output_tokensto limit summary length
Switching Between Cloud and Local
Use Cloud (Recommended)
config = ContextEncoderConfig(
provider=EncoderProvider.JAMBA_CLOUD,
jamba=JambaConfig(use_cloud=True)
)
Use Local (Free, Requires LM Studio)
config = ContextEncoderConfig(
provider=EncoderProvider.JAMBA_LOCAL,
jamba=JambaConfig(
use_cloud=False,
local_base_url="http://localhost:1234/v1"
)
)
See LMSTUDIO_SETUP.md for local setup instructions.
Cost Estimation
Jamba-1.5-Mini Pricing:
| Context Size | Input Tokens | Output Tokens | Cost per Request |
|---|---|---|---|
| Small (10k) | 10,000 | 2,000 | $0.002 |
| Medium (50k) | 50,000 | 4,000 | $0.010 |
| Large (100k) | 100,000 | 4,000 | $0.020 |
| Massive (256k) | 256,000 | 4,000 | $0.051 |
Free tier gives you:
- ~5,000 requests (small context)
- ~1,000 requests (medium context)
- ~500 requests (large context)
- ~200 requests (massive context)
Best Practices
- Start with mock provider for development:
config = ContextEncoderConfig(provider=EncoderProvider.MOCK) -
Use cloud for production (reliable, scalable)
-
Use local for experimentation (free, private)
-
Monitor your usage at studio.ai21.com
- Cache compressed contexts to avoid redundant API calls
Security
⚠️ Never commit your API key to Git!
- âś… Use environment variables
- âś… Use
.envfile (in.gitignore) - ❌ Don’t hardcode keys in source code
- ❌ Don’t share keys in screenshots/logs
Support
- AI21 Documentation: https://docs.ai21.com/
- AI21 Discord: https://discord.gg/ai21labs
- GitHub Issues: https://github.com/vivek5200/ouroboros/issues