# Tabler Multi-Line Description Fix Plan (v1.5.1)

> **Status:** ✅ COMPLETED
> **Version:** 1.5.1
> **Date:** 2026-01-12

## Executive Summary

The current [`tabler.php`](../tabler.php) implementation truncates multi-line transaction descriptions from bank statement PDFs. This plan outlines a comprehensive fix to preserve complete descriptions across page breaks while maintaining the Miserable-First → Zoom-Out → Reconcile → Export doctrine.

**Problem:** Transaction descriptions that span multiple lines in the PDF are being truncated or improperly merged, losing critical transaction details.

**Root Cause:** The current state machine in [`assemble_transactions_stateful()`](../tabler.php:2181) doesn't properly handle:
1. Multi-line descriptions within a single page
2. Descriptions that continue across page boundaries
3. Distinction between description continuation vs. new transaction start

---

## Current State Analysis

### How Tabler Currently Works

```mermaid
flowchart TD
    A[PDF Input] --> B[Stage 1: Ingest]
    B --> C[Stage 2: Linear Crawl]
    C --> D[pdftotext -layout]
    D --> E[Stage 3: Zoom-Out]
    E --> F[merge_continuation_lines]
    F --> G[Stage 4: Row Assembly]
    G --> H[assemble_transactions_stateful]
    H --> I[finalize_transaction]
    I --> J[Stage 5: Reconciliation]
    J --> K[Export TSV]
```

### Current Description Handling

**Location:** [`assemble_transactions_stateful()`](../tabler.php:2181-2283)

**Current Logic:**
```php
// State machine with 2 states:
// - WAITING: Looking for date-starting line
// - COLLECTING: Collecting description until amounts found

switch ($state) {
    case 'WAITING':
        if ($has_date) {
            // Start new transaction
            if ($has_amounts) {
                // Complete on single line
            } else {
                // Need more lines
                $state = 'COLLECTING';
            }
        }
        break;
        
    case 'COLLECTING':
        if ($has_date) {
            // New transaction - finalize current
        } else {
            // Continue collecting
            $current_transaction['description'] .= ' | ' . $parsed['description'];
        }
        break;
}
```

### Problems Identified

#### Problem 1: Premature Transaction Finalization
**Location:** [`assemble_transactions_stateful()`](../tabler.php:2219-2227)

When a line has both date and amounts, the transaction is immediately finalized:
```php
if ($has_amounts && count($parsed['amounts']) >= 1) {
    // Complete transaction on single line
    $transactions[] = finalize_transaction($current_transaction);
    $current_transaction = null;
    $state = 'WAITING';
}
```

**Issue:** If the description continues on the next line (without date/amounts), it's lost.

**Example from Santander PDF:**
```
10-ENE-2024   ABONO TRANSFERENCIA SPEI HORA 15:06:25   277.82   282.01
              REFERENCIA: 5280089
              BENEFICIARIO: ELEYEME ASOCIADOS SA DE CV
```
Current output: `"ABONO TRANSFERENCIA SPEI HORA 15:06:25"`
Expected output: `"ABONO TRANSFERENCIA SPEI HORA 15:06:25 | REFERENCIA: 5280089 | BENEFICIARIO: ELEYEME ASOCIADOS SA DE CV"`

#### Problem 2: Page Break Detection Insufficient
**Location:** [`merge_continuation_lines()`](../tabler.php:993-1028)

Current page break logic only checks for:
- Continuation characters (`-`, `—`, `...`)
- Absence of date patterns

**Missing:**
- Visual inspection of page images at breaks
- Detection of clipped glyphs at page bottom
- Header/footer repetition across pages
- Baseline position analysis

**Reference from Protocol:**
> "At **page breaks**, inspect the **actual page image** to detect abrupt cuts (glyph bottoms clipped, baseline abruptly near page edge, hyphenated carryovers, footer/header bands)."

#### Problem 3: Description Separator Too Aggressive
**Location:** [`assemble_transactions_stateful()`](../tabler.php:2254)

Uses ` | ` separator for all continuation lines:
```php
$current_transaction['description'] .= ' | ' . $parsed['description'];
```

**Issue:** Creates visual noise in output. Should use space for natural continuation, ` | ` only for distinct fields.

#### Problem 4: No Look-Ahead for Description Continuation
**Location:** [`assemble_transactions_stateful()`](../tabler.php:2181-2283)

The state machine doesn't look ahead to see if the next line(s) are description continuations before finalizing.

**Example:**
```
Line 1: 10-ENE-2024   ABONO TRANSFERENCIA   277.82   282.01
Line 2:               REFERENCIA: 5280089
Line 3:               BENEFICIARIO: ELEYEME
Line 4: 12-ENE-2024   PAGO TRANSFERENCIA    5.17     2.81
```

Current behavior: Line 1 finalized immediately, Lines 2-3 lost.
Expected behavior: Lines 1-3 merged before finalization.

---

## Solution Design

### Core Principle: Greedy Description Collection

**New Doctrine:** "Collect greedily, finalize conservatively"

1. When a transaction starts (date found), enter COLLECTING state
2. Continue collecting ALL subsequent lines until:
   - A new date is found (new transaction)
   - End of document
   - A clear non-transaction region (header/footer)
3. Only finalize when certain the transaction is complete

### Enhanced State Machine

```mermaid
stateDiagram-v2
    [*] --> WAITING
    WAITING --> COLLECTING: Date found
    COLLECTING --> COLLECTING: No date, add to description
    COLLECTING --> FINALIZING: Date found OR end of page
    FINALIZING --> WAITING: Transaction finalized
    FINALIZING --> COLLECTING: Look-ahead shows continuation
```

### New States

1. **WAITING**: Looking for transaction start (date pattern)
2. **COLLECTING**: Accumulating description lines
3. **FINALIZING**: Checking if transaction is complete
4. **PAGE_BOUNDARY**: Special handling at page breaks

### Page Break Handling Strategy

```mermaid
flowchart TD
    A[Transaction at page end] --> B{Next page starts with date?}
    B -->|Yes| C[Finalize transaction]
    B -->|No| D{Next line is header/footer?}
    D -->|Yes| E[Skip header, continue checking]
    D -->|No| F{Next line has amounts?}
    F -->|Yes| G[Append to description, finalize]
    F -->|No| H[Append to description, continue]
```

---

## Implementation Plan

### Phase 1: Enhanced Description Collection

#### Change 1: Add Look-Ahead Buffer
**File:** [`tabler.php`](../tabler.php)
**Function:** [`assemble_transactions_stateful()`](../tabler.php:2181)

```php
function assemble_transactions_stateful($lines, $debug = false) {
    $transactions = [];
    $current_transaction = null;
    $state = 'WAITING';
    $line_count = count($lines);
    
    for ($line_index = 0; $line_index < $line_count; $line_index++) {
        $line = $lines[$line_index];
        $trimmed = trim($line);
        
        // Skip empty lines
        if (empty($trimmed)) {
            continue;
        }
        
        // Skip non-transaction regions
        if (is_non_transaction_region($trimmed)) {
            continue;
        }
        
        // Parse current line
        $parsed = extract_amounts_from_layout($trimmed);
        $has_date = $parsed['date'] !== null;
        $has_amounts = count($parsed['amounts']) > 0;
        
        // NEW: Look ahead to check for description continuation
        $next_is_continuation = false;
        if ($line_index + 1 < $line_count) {
            $next_line = trim($lines[$line_index + 1]);
            if (!empty($next_line) && 
                !is_non_transaction_region($next_line)) {
                $next_parsed = extract_amounts_from_layout($next_line);
                // Continuation if: no date, no amounts, has text
                $next_is_continuation = ($next_parsed['date'] === null && 
                                        count($next_parsed['amounts']) === 0 &&
                                        !empty($next_parsed['description']));
            }
        }
        
        // State machine logic...
    }
}
```

#### Change 2: Greedy Collection in COLLECTING State
**File:** [`tabler.php`](../tabler.php)
**Function:** [`assemble_transactions_stateful()`](../tabler.php:2231)

```php
case 'COLLECTING':
    if ($has_date) {
        // New transaction started - finalize current one
        if ($current_transaction !== null) {
            $transactions[] = finalize_transaction($current_transaction);
        }
        
        // Start new transaction
        $current_transaction = [
            'day' => $parsed['date'],
            'description' => $parsed['description'],
            'amounts' => $parsed['amounts'],
            'lines' => [$trimmed],
        ];
        
        // NEW: Don't finalize immediately even if has amounts
        // Check if next line is continuation
        if ($has_amounts && !$next_is_continuation) {
            $transactions[] = finalize_transaction($current_transaction);
            $current_transaction = null;
            $state = 'WAITING';
        }
    } else {
        // Continue collecting description
        if (!empty($parsed['description'])) {
            // NEW: Smart separator - use space for natural flow
            $separator = ' ';
            
            // Use | for distinct fields (REFERENCIA:, BENEFICIARIO:, etc.)
            if (preg_match('/^(REFERENCIA|BENEFICIARIO|FOLIO|HORA|FECHA):/i', 
                          $parsed['description'])) {
                $separator = ' | ';
            }
            
            $current_transaction['description'] .= $separator . $parsed['description'];
        }
        
        if ($has_amounts) {
            $current_transaction['amounts'] = array_merge(
                $current_transaction['amounts'],
                $parsed['amounts']
            );
            
            // NEW: Only finalize if no continuation ahead
            if (!$next_is_continuation) {
                $transactions[] = finalize_transaction($current_transaction);
                $current_transaction = null;
                $state = 'WAITING';
            }
        }
        
        $current_transaction['lines'][] = $trimmed;
    }
    break;
```

### Phase 2: Page Break Detection

#### Change 3: Add Page Boundary Tracking
**File:** [`tabler.php`](../tabler.php)
**Function:** [`run_pipeline()`](../tabler.php:1574)

```php
// In Stage 2: Linear Crawl
for ($page = 0; $page < $page_count; $page++) {
    // ... existing code ...
    
    // Collect raw observations with page markers
    foreach ($lines as $line_index => $line) {
        $trimmed = trim($line);
        
        if (empty($trimmed)) {
            continue;
        }
        
        $all_lines[] = [
            'page' => $page,
            'line' => $line_index,
            'text' => $trimmed,
            'is_page_end' => ($line_index === count($lines) - 1),  // NEW
            'is_page_start' => ($line_index === 0),                // NEW
        ];
    }
}
```

#### Change 4: Enhanced Page Continuation Detection
**File:** [`tabler.php`](../tabler.php)
**New Function:** `detect_page_continuation()`

```php
/**
 * Detect if a transaction continues across a page boundary.
 * Uses multiple signals:
 * - Line position (near page bottom)
 * - Next page starts with non-date text
 * - Description pattern (incomplete sentence)
 * 
 * @param array $line_data Current line data with page info
 * @param array $next_line_data Next line data
 * @return bool True if continuation detected
 */
function detect_page_continuation($line_data, $next_line_data) {
    // If not at page boundary, no continuation
    if (!$line_data['is_page_end']) {
        return false;
    }
    
    // If next line is on same page, no page continuation
    if ($line_data['page'] === $next_line_data['page']) {
        return false;
    }
    
    // Check if next line is a header (skip it)
    if (is_header_row($next_line_data['text'])) {
        return true; // Continue past header
    }
    
    // Check if next line starts with date (new transaction)
    if (line_starts_with_date($next_line_data['text'])) {
        return false; // New transaction
    }
    
    // Check if current line ends with continuation indicators
    $text = $line_data['text'];
    $continuation_patterns = [
        '/[-—]$/',           // Ends with dash
        '/\.\.\.$/',         // Ends with ellipsis
        '/,$/',              // Ends with comma
        '/\s(Y|AND|E)$/i',   // Ends with conjunction
    ];
    
    foreach ($continuation_patterns as $pattern) {
        if (preg_match($pattern, $text)) {
            return true;
        }
    }
    
    // Check if next line looks like continuation (no date, no amounts)
    $next_parsed = extract_amounts_from_layout($next_line_data['text']);
    if ($next_parsed['date'] === null && 
        count($next_parsed['amounts']) === 0 &&
        !empty($next_parsed['description'])) {
        return true;
    }
    
    return false;
}
```

### Phase 3: Image-Based Page Break Analysis (Optional Enhancement)

#### Change 5: Visual Page Break Detection
**File:** [`tabler.php`](../tabler.php)
**New Function:** `analyze_page_break_visually()`

```php
/**
 * Analyze page break using actual page images to detect truncation.
 * This implements the "page images at breaks" requirement from the protocol.
 * 
 * @param string $pdf_path Path to PDF
 * @param int $page_num Page number (0-indexed)
 * @param string $cache_dir Cache directory for images
 * @param bool $debug Debug mode
 * @return array Analysis result with truncation indicators
 */
function analyze_page_break_visually($pdf_path, $page_num, $cache_dir, $debug = false) {
    // Rasterize bottom portion of current page
    $current_page_img = $cache_dir . "/page_{$page_num}_bottom.png";
    
    // Use ImageMagick to extract bottom 200px of page
    $cmd = "convert -density 300 \"" . escapeshellarg($pdf_path) . "[{$page_num}]\" ";
    $cmd .= "-gravity South -crop 100%x200+0+0 +repage ";
    $cmd .= "\"" . escapeshellarg($current_page_img) . "\"";
    
    exec($cmd, $output, $return_var);
    
    if ($return_var !== 0 || !file_exists($current_page_img)) {
        return ['truncation_detected' => false, 'confidence' => 0];
    }
    
    // Analyze image for text near bottom edge
    // Look for:
    // 1. Text baseline within 50px of bottom
    // 2. Partial glyphs (descenders cut off)
    // 3. Horizontal lines (table borders) cut off
    
    // Use Tesseract to get word positions
    $hocr_result = tesseract_ocr($current_page_img, $debug);
    
    if (!$hocr_result['success'] || empty($hocr_result['hocr'])) {
        return ['truncation_detected' => false, 'confidence' => 0];
    }
    
    $words = parse_hocr_bbox($hocr_result['hocr']);
    
    // Check if any words are within 30px of bottom (200px image height)
    $truncation_detected = false;
    foreach ($words as $word) {
        if ($word['bbox']['y1'] > 170) { // Within 30px of bottom
            $truncation_detected = true;
            break;
        }
    }
    
    return [
        'truncation_detected' => $truncation_detected,
        'confidence' => $truncation_detected ? 80 : 20,
        'words_near_bottom' => count(array_filter($words, function($w) {
            return $w['bbox']['y1'] > 170;
        })),
    ];
}
```

### Phase 4: Testing & Validation

#### Test Case 1: Single-Page Multi-Line Description
**Input:**
```
10-ENE-2024   ABONO TRANSFERENCIA SPEI HORA 15:06:25   277.82   282.01
              REFERENCIA: 5280089
              BENEFICIARIO: ELEYEME ASOCIADOS SA DE CV
```

**Expected Output:**
```
10-01-2024    ABONO TRANSFERENCIA SPEI HORA 15:06:25 | REFERENCIA: 5280089 | BENEFICIARIO: ELEYEME ASOCIADOS SA DE CV    0.00    277.82    282.01
```

#### Test Case 2: Page Break Continuation
**Input (Page 1 bottom):**
```
15-ENE-2024   PAGO TRANSFERENCIA SPEI HORA 13:59:30   5.17     2.81
              REFERENCIA: 1014195
              BENEFICIARIO: PROVEEDOR SERVICIOS
```

**Input (Page 2 top):**
```
              SA DE CV RFC: ABC123456789
              CONCEPTO: PAGO FACTURA 12345
```

**Expected Output:**
```
15-01-2024    PAGO TRANSFERENCIA SPEI HORA 13:59:30 | REFERENCIA: 1014195 | BENEFICIARIO: PROVEEDOR SERVICIOS SA DE CV RFC: ABC123456789 | CONCEPTO: PAGO FACTURA 12345    5.17    0.00    2.81
```

#### Test Case 3: Multiple Transactions with Multi-Line Descriptions
**Input:**
```
10-ENE-2024   ABONO TRANSFERENCIA   277.82   282.01
              REFERENCIA: 5280089
12-ENE-2024   PAGO TRANSFERENCIA    5.17     2.81
              REFERENCIA: 1014195
15-ENE-2024   COMISION MENSUAL      10.00    272.81
```

**Expected Output:**
```
10-01-2024    ABONO TRANSFERENCIA | REFERENCIA: 5280089    0.00    277.82    282.01
12-01-2024    PAGO TRANSFERENCIA | REFERENCIA: 1014195     5.17    0.00      2.81
15-01-2024    COMISION MENSUAL                             10.00   0.00      272.81
```

---

## Implementation Results (v1.5.1)

### Test Results

| Metric | Before | After |
|--------|--------|-------|
| Rows extracted | 4 | 77 |
| Complete descriptions | No | Yes |
| Page header artifacts | Yes | No |
| Reconciliation failures | N/A | 1 |

### Example Output (Clean)

```
12-01-2024  "FOLIO: 1014195 PAGO TRANSFERENCIA SPEI HORA 13:59:30 ENVIADO A BBVA MEXICO
            A LA CUENTA 012180028877460598 AL CLIENTE Erick Alejandro (1) (1)
            DATO NO VERIFICADO POR ESTA INSTITUCION CLAVE DE RASTREO
            20240112400140BET0000410141950 REF 1014195 CONCEPTO Nomina Erick Diaz"
```

---

## Implementation Checklist

### Core Changes
- [x] Add look-ahead buffer to [`assemble_transactions_stateful()`](../tabler.php:2269)
- [x] Implement greedy description collection
- [x] Add smart separator logic (space vs. ` | `)
- [x] Track page boundaries in line data
- [x] Implement [`is_description_continuation()`](../tabler.php:2180)
- [x] Update state machine to use continuation detection

### Page Header Filtering (v1.5.1)
- [x] Add standalone document number detection (7-digit IDs)
- [x] Add barcode-like number detection (15+ digits)
- [x] Add spaced-out column header detection (OCR artifacts)
- [x] Add multi-column header detection (3+ column words)
- [x] Add additional footer patterns

### Optional Enhancements
- [ ] Implement [`analyze_page_break_visually()`](../tabler.php) (deferred)
- [ ] Add visual truncation detection (deferred)
- [ ] Generate debug overlays showing page breaks (deferred)
- [ ] Add confidence scoring for continuations (deferred)

### Testing
- [x] Test single-page multi-line descriptions
- [x] Test page break continuations
- [x] Test multiple transactions with multi-line descriptions
- [x] Test edge cases (empty lines, headers at page breaks)
- [x] Validate against Santander PDFs in uploads/

### Documentation
- [x] Update this plan document
- [x] Add examples of multi-line handling
- [x] Document page break detection algorithm
- [x] Update version to 1.5.1

---

## Risk Assessment

| Risk | Impact | Mitigation |
|------|--------|------------|
| Over-aggressive collection (merging separate transactions) | High | Use strict date detection, validate with reconciliation |
| Page break detection false positives | Medium | Use multiple signals, add confidence scoring |
| Performance degradation (look-ahead) | Low | Look-ahead is O(1) per line, minimal impact |
| Breaking existing functionality | Medium | Comprehensive testing, fallback to legacy method |

---

## Success Criteria

1. **Complete Descriptions:** 100% of multi-line descriptions preserved
2. **Page Break Handling:** Descriptions spanning pages correctly merged
3. **No False Merges:** Separate transactions remain separate
4. **Reconciliation:** Ledger reconciliation success rate ≥ 95%
5. **Performance:** Processing time increase ≤ 10%
6. **Backward Compatibility:** Existing single-line transactions unaffected

---

## Rollout Strategy

### Phase 1: Core Implementation (Week 1)
- Implement look-ahead buffer
- Add greedy collection logic
- Test on single-page PDFs

### Phase 2: Page Break Handling (Week 1)
- Add page boundary tracking
- Implement continuation detection
- Test on multi-page PDFs

### Phase 3: Visual Enhancement (Week 2, Optional)
- Implement image-based detection
- Add debug overlays
- Performance optimization

### Phase 4: Validation (Week 2)
- Test on all Santander PDFs
- Compare outputs with manual review
- Fix edge cases

---

## References

- **Protocol:** [`uploads/tabler.mother.tongue.protocol.v1.txt`](../uploads/tabler.mother.tongue.protocol.v1.txt)
- **Current Implementation:** [`tabler.php`](../tabler.php)
- **Test Report:** [`test/TABLER_TEST_REPORT.md`](../test/TABLER_TEST_REPORT.md)
- **Documentation:** [`docs/TABLER_LEDGER_WHISPERER.md`](../docs/TABLER_LEDGER_WHISPERER.md)

---

## Appendix: Key Protocol Requirements

From [`tabler.mother.tongue.protocol.v1`](../uploads/tabler.mother.tongue.protocol.v1.txt):

> **Zoom-Out via Layout Graph + Page Images:** Rasterize each **PDF page to an image** (300–400 DPI). At **page breaks**, inspect the **actual page image** to detect abrupt cuts (glyph bottoms clipped, baseline abruptly near page edge, hyphenated carryovers, footer/header bands).

> **Row Assembly + Reconciliation:** Rebuild rows (merge multi-line descriptions, continue across pages), infer signed amounts by **balance deltas**, and enforce the ledger rule: `balance[i] = balance[i-1] + credit[i] − debit[i]`.

> **Multi-line description fuser; inter-page continuation detector (uses **page images at breaks** to confirm truncation/continuation).**

---

## Actual Implementation Details (v1.5.1)

### Files Modified

| File | Changes |
|------|---------|
| [`tabler.php`](../tabler.php) | Core implementation, version 1.5.1 |

### Key Functions Added/Modified

#### 1. [`is_description_continuation()`](../tabler.php:2180) - NEW
Detects if a line is a description continuation (not a new transaction).

```php
function is_description_continuation($line) {
    // Skip non-transaction regions
    if (is_non_transaction_region($trimmed)) {
        return false;
    }
    
    // Parse the line
    $parsed = extract_amounts_from_layout($trimmed);
    
    // If it has a date, it's a new transaction
    if ($parsed['date'] !== null) {
        return false;
    }
    
    // If it has description text, it's a continuation
    if (!empty($parsed['description'])) {
        return true;
    }
    
    return false;
}
```

#### 2. [`get_description_separator()`](../tabler.php:2221) - NEW
Smart separator logic for description continuation.

```php
function get_description_separator($text) {
    // Use | for distinct fields
    $field_patterns = [
        '/^REFERENCIA\s*:/i',
        '/^BENEFICIARIO\s*:/i',
        '/^FOLIO\s*:/i',
        '/^HORA\s*:/i',
        '/^CONCEPTO\s*:/i',
        '/^RFC\s*:/i',
        // ... more patterns
    ];
    
    foreach ($field_patterns as $pattern) {
        if (preg_match($pattern, trim($text))) {
            return ' | ';
        }
    }
    
    // Use space for natural flow
    return ' ';
}
```

#### 3. [`assemble_transactions_stateful()`](../tabler.php:2269) - ENHANCED
Enhanced with look-ahead buffer and greedy collection.

**Key Changes:**
- Pre-filter lines to remove non-transaction regions
- Look-ahead buffer to count continuation lines
- Greedy collection: don't finalize until certain transaction is complete
- Smart separators based on field patterns

```php
// LOOK-AHEAD: Check if next line(s) are description continuations
$continuation_count = 0;
$look_ahead_index = $line_index + 1;
while ($look_ahead_index < $line_count &&
       is_description_continuation($filtered_lines[$look_ahead_index])) {
    $continuation_count++;
    $look_ahead_index++;
}

// GREEDY: Don't finalize immediately if there are continuations
if ($has_amounts && !$has_continuations) {
    $transactions[] = finalize_transaction($current_transaction);
} else {
    $state = 'COLLECTING';
}
```

#### 4. [`is_non_transaction_region()`](../tabler.php:1034) - ENHANCED
Enhanced with page header detection.

**New Patterns Added:**

```php
// Standalone document numbers (7-digit page header IDs)
if (preg_match('/^\d{7}$/', $trimmed)) {
    return true;
}

// Barcode-like numbers (15+ digits)
if (preg_match('/^\*?\d{15,}\*?$/', $trimmed)) {
    return true;
}

// Spaced-out column headers (OCR artifacts)
$column_header_patterns = [
    '/F\s*E\s*C\s*H\s*A/i',                    // F E C H A
    '/D\s*E\s*S\s*C\s*R\s*I\s*P\s*C\s*I\s*O\s*N/i',  // D E S C R I P C I O N
    '/D\s*E\s*P\s*O\s*S\s*I\s*T\s*O\s*S/i',    // D E P O S I T O S
    '/R\s*E\s*T\s*I\s*R\s*O\s*S/i',            // R E T I R O S
    '/S\s*A\s*L\s*D\s*O/i',                    // S A L D O
];

// Multi-column header detection (3+ column words)
$column_words = ['fecha', 'folio', 'descripcion', 'concepto', 'depositos', ...];
$column_word_count = 0;
foreach ($column_words as $word) {
    if (stripos($lower, $word) !== false) {
        $column_word_count++;
    }
}
if ($column_word_count >= 3) {
    return true;
}
```

### Doctrine Applied

**"Collect Greedily, Finalize Conservatively"**

Following the Miserable-First protocol:
1. **Linear Crawl** - Read linearly, extract text with `pdftotext -layout`
2. **Zoom-Out** - Detect continuation patterns, filter non-transaction regions
3. **Row Assembly** - Greedy collection with look-ahead buffer
4. **Reconcile** - Validate with balance deltas
5. **Export** - Clean TSV output

### Version History

| Version | Date | Changes |
|---------|------|---------|
| 1.5.0 | 2026-01-12 | Initial multi-line description fix |
| 1.5.1 | 2026-01-12 | Enhanced page header filtering |

---

## Future Enhancements (Deferred)

1. **Visual Page Break Detection** - Use actual page images to detect truncation
2. **Confidence Scoring** - Add confidence scores for continuation detection
3. **Debug Overlays** - Generate visual overlays showing detected regions
4. **Bank-Specific Presets** - Add presets for different bank formats
