# RAG System Architecture (Final)

## High-Level Architecture

```mermaid
flowchart TD
    %% Context7 Integration Layer
    C7Resolver[Context7LibraryResolver] --> C7Retriever[Context7KnowledgeRetriever]
    C7Retriever --> C7Cache[(Context7 Cache)]
    
    %% Local Indexer
    RagIndexer[RagIndexer v1.0] --> LocalMD[Local Markdown Docs]
    LocalMD --> DocsRepo[(docs/rag/)]
    
    %% Query Interface
    RagQuery[RagQuery v2.0] --> LocalSearch[Local Search]
    RagQuery --> Context7Augmentation[Context7 Augmentation]
    LocalSearch --> DocsRepo
    Context7Augmentation --> C7Retriever
    
    %% Combined Result
    LocalSearch --> CombinedContext[Combined Context]
    Context7Augmentation --> CombinedContext
    CombinedContext --> LLM[LLM Generates Answer]
    
    %% MCP Tools (when available)
    McpResolve[MCP resolve-library-id] -.-> C7Resolver
    McpQuery[MCP query-docs] -.-> C7Retriever
    
    %% Relationships
    C7Cache --> C7Retriever
    DocsRepo --> LocalSearch
```

## Component Overview

| Component | Type | Responsibility |
|-----------|------|----------------|
| **Context7LibraryResolver** | Class | Resolves library names to Context7 IDs via MCP or fallback |
| **Context7KnowledgeRetriever** | Class | Fetches remote docs from Context7 with caching |
| **Context7Cache** | Files | `/lamp/www/importer/cache/context7/` |
| **RagIndexer** | CLI Script | Scans codebase, DB schema, API specs → Markdown |
| **Local Markdown Docs** | Files | `/lamp/www/importer/docs/rag/` |
| **RagQuery** | Class | Queries local + Context7, merges results |
| **LLM** | External | Generates answer from combined context |

## Data Flow

```
─────────── Query ──────────────────────────── Local ──────────────────────── Context7 ───────────────── Result ──
│           │                       │                │                    │                  │             │
│           ▼                       ▼                ▼                    ▼                  ▼             ▼
│      getContext()          determineSections()   loadSection()        queryDocs()              concat()
│           │                       │                │                    │                  │             │
│           └───────────────────────┴────────────────┴──────────────────────────────────────┴─────────────┘
│                  (merged context returned)                                              │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────
```

## File Structure

```
docs/rag/                          # Local documentation index
├── README.md                      # Index overview and usage
├── codebase/                      # Code documentation
│   ├── lib/                       # Library classes
│   │   ├── DatabaseHelper.php.md
│   │   ├── SchemaDetector.md
│   │   └── ...
│   ├── api/                       # API endpoints
│   │   ├── getDatabases.md
│   │   └── ...
│   └── root/                      # Root scripts
│       ├── config.md
│       └── ...
├── database/                      # Database documentation
│   ├── conventions.md
│   └── import_logger.md
└── api/                          # (empty, not used)

cache/context7/                    # Context7 remote cache
└── {hash}.cache                  # TTL: 1 hour
```

## Usage

### Basic (local only)

```php
require_once 'lib/RagQuery.php';

$context = RagQuery::getContext("How do I add a new column?", [
    'useContext7' => false
]);
```

### With Context7 augmentation

```php
require_once 'lib/RagQuery.php';

// Query with Context7 integration
$context = RagQuery::getContext("How to read Excel files in PHP?", [
    'useContext7' => true,
    'maxResults' => 3
]);

// Or use defaults (Context7 enabled)
$context = RagQuery::getContext("database query examples");
```

### Programmatic access

```php
// Get version info
$info = RagQuery::getVersion();
echo $info['version'];         // "2.0"
echo $info['context7_enabled']; // true

// Get Context7 status
$status = RagQuery::getContext7Status();
print_r($status['library_map']);

// Resolve library ID
require_once 'lib/Context7LibraryResolver.php';
$libId = Context7LibraryResolver::resolve('phpspreadsheet/phpspreadsheet');
echo $libId; // "/phpspreadsheet/phpspreadsheet"

// Query Context7 directly
require_once 'lib/Context7KnowledgeRetriever.php';
$docs = Context7KnowledgeRetriever::queryDocs('php/php', 'How to create arrays?');
```

## Deployment

### Manual indexing

```bash
# Build local index
php lib/RagIndexer.php

# Test the system
php tests/test_rag_system.php
php tests/test_rag_context7.php
```

### Cron job (daily at 2 AM)

```bash
0 2 * * * php /lamp/www/importer/lib/RagIndexer.php >> /var/log/rag_indexer.log 2>&1
```

### Docker

```dockerfile
FROM php:8.2-cli

WORKDIR /lamp/www/importer

# Install dependencies
RUN docker-php-ext-install mysqli pmbstring

# Build index on startup
CMD ["php", "lib/RagIndexer.php", "&&", "php", "-S", "0.0.0.0:8000", "-t", "/lamp/www/importer"]
```

## Integration Points

### MCP Tools (when available)

The system is designed to use MCP tools when available:

1. **resolve-library-id** - Resolves library names to Context7 IDs
2. **query-docs** - Queries Context7 documentation

When MCP is not available, the system falls back to:
- Local library mapping cache
- Placeholder content generation

### RagQuery Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| useContext7 | bool | true | Enable Context7 augmentation |
| maxResults | int | 3 | Max Context7 results |
| libraries | array | auto | Override libraries to query |

## Testing

```bash
# Run all tests
php tests/test_rag_system.php      # Local RAG tests
php tests/test_rag_context7.php    # Context7 integration tests
```

### Expected output

```
✅ RagQuery version info includes context7_enabled
✅ Context7LibraryResolver has version
✅ Context7KnowledgeRetriever has version
✅ Resolve vercel/next.js
✅ Query PHP documentation
✅ getContext includes Context7 when enabled

🎉 ALL CONTEXT7 INTEGRATION TESTS PASSED!
```

## Next Steps

1. ✅ Implement `Context7LibraryResolver` - Done
2. ✅ Implement `Context7KnowledgeRetriever` - Done
3. ✅ Update `RagQuery::getContext()` - Done
4. ✅ Add caching for remote docs - Done
5. ✅ Create `tests/test_rag_context7.php` - Done
6. ☐ Update `docs/RAG_SYSTEM_PLAN.md` - This file
7. ☐ Document deployment steps - In this file
8. ☐ Run tests end-to-end - Manual verification

---

*All diagrams are rendered with Mermaid. Ensure the documentation viewer or VS Code extension supports Mermaid rendering.*
