# CFDI Matcher - Invoice-Payment Reconciliation System

## Overview

The CFDI Matcher is an iterative learning system that automatically reconciles invoices (`eleyeme_cfdi_emitidos`) with bank deposits (`banco_cuenta_mov`). Built on the proven PMS Matcher architecture, it uses multi-tier fuzzy matching with progressive learning to improve accuracy over time.

## Quick Start

### Run the Matcher

**Via Web** (Recommended):
```bash
curl -s https://dev-app.filemonprime.net/quantix/backoffice/helper/cfdi_matcher_lamp.php
```

**Via CLI**:
```bash
/lamp/php/bin/php /lamp/www/quantix/backoffice/helper/cfdi_matcher_lamp.php
```

### Initial Results (Baseline)

- **Match Rate**: 12.16% (31/255 invoices)
- **Average Confidence**: 59.48%
- **Matched Amounts**: $4.4M invoices → $4.3M deposits
- **Reconciliation Gap**: -$75,487 (1.7%)

## Architecture

### Database Tables

1. **cfdi_matcher_iterations** - Tracks each matching run with statistics
2. **cfdi_matcher_results** - Individual invoice→deposit match results
3. **cfdi_matcher_patterns** - Discovered patterns with success rates
4. **cfdi_matcher_failures** - Unmatched items for analysis

### Matching Tiers

#### Tier 0: Exact Matches (95-100% confidence)
- **Exact amount + close date**: Amount perfect, date within ±3 days
- **UUID in reference**: Invoice UUID found in deposit `numero` field
- **RFC + exact amount**: RFC match + amount within 1%

#### Tier 1: Strong Matches (80-94% confidence)
- **Amount + week**: Amount within ±1%, date within ±7 days
- **Client fuzzy + amount**: Client name similarity ≥70%, amount within ±5%
- **RFC + two weeks**: RFC exact match, date within ±14 days

#### Tier 2: Probable Matches (65-79% confidence)
- **Amount + month**: Amount within ±5%, date within ±30 days
- **Client similarity**: Name similarity ≥60%, amount within ±10%

#### Tier 3: Possible Matches (50-64% confidence)
- **Amount + two months**: Amount within ±10%, date within ±60 days

## Files Structure

```
/lamp/www/quantix/
├── db/enero_2025/
│   └── cfdi_matcher_schema.sql          # Database schema
├── backoffice/helper/
│   ├── cfdi_matcher_lamp.php            # Main CLI matcher
│   ├── cfdi_matcher_lib.php             # Core matching functions
│   ├── cfdi_matcher_test_runner.php     # Iterative test framework (TODO)
│   └── CFDI_MATCHER_README.md           # This file
└── backoffice/
    └── cfdi_matcher_results.php         # Web UI for review (TODO)
```

## Core Matching Functions

### Date & Amount Utilities
- `days_between($date1, $date2)` - Calculate days between dates
- `dates_within_window($date1, $date2, $days)` - Check if dates are within window
- `amount_difference_percent($amt1, $amt2)` - Calculate percentage difference
- `amounts_within_tolerance($amt1, $amt2, $pct)` - Check amount tolerance

### Text Utilities
- `normalize_text_cfdi($text)` - Remove accents, lowercase, trim
- `extract_client_from_reference($ref)` - Extract client names from bank references
- `extract_invoice_number_from_reference($ref)` - Extract invoice numbers
- `text_similarity($text1, $text2)` - Calculate similarity (0-100)

### Matching Functions (by Tier)

**Tier 0:**
- `match_cfdi_tier0_exact_amount_date($invoice, $deposit)`
- `match_cfdi_tier0_uuid_in_reference($invoice, $deposit)`
- `match_cfdi_tier0_rfc_amount($invoice, $deposit)`

**Tier 1:**
- `match_cfdi_tier1_amount_week($invoice, $deposit)`
- `match_cfdi_tier1_client_fuzzy($invoice, $deposit)`
- `match_cfdi_tier1_rfc_twoweeks($invoice, $deposit)`

**Tier 2:**
- `match_cfdi_tier2_amount_month($invoice, $deposit)`
- `match_cfdi_tier2_client_amount($invoice, $deposit)`

**Tier 3:**
- `match_cfdi_tier3_amount_twomonths($invoice, $deposit)`

**Master Function:**
- `match_invoice_to_deposit($invoice, $deposit)` - Tries all tiers, returns best match

## Key Insights from Initial Run

### What Works Well
1. **Amount-based matching** - Most matches found via amount tolerance (±5-10%)
2. **Date windows** - 30-60 day windows capture payment processing delays
3. **Partial payment detection** - System flags amount differences (-7% to +5%)

### Challenges Identified
1. **Missing RFC data** - Most deposits lack `cliente_rfc` (all NULL in sample)
2. **Limited client names** - Most deposits lack `cliente` field
3. **Opaque references** - Bank `numero` field is cryptic ("T20 SPEI RECIBIDO")
4. **Payment timing variability** - Delays range from -25 to +30 days

### Unmatched Analysis

**224 Unmatched Invoices** (87.84%):
- Potential reasons:
  - Invoices not yet paid (accounts receivable)
  - Payments in different bank accounts
  - Payment timing outside 60-day window
  - Amounts differ >10% (partial payments, discounts)

**29 Unmatched Deposits** (48.33%):
- Potential reasons:
  - Deposits not from invoiced clients (other income)
  - Missing invoice records in system
  - Invoice amounts aggregated differently

## Next Steps (Iteration Planning)

### Iteration 2: Text Mining Enhancement
**Goal**: Improve RFC/client name extraction from deposit references

**Actions**:
1. Parse `numero` field for client identifiers
2. Build client name variation lookup table
3. Implement fuzzy client name matching from references
4. **Expected improvement**: +10-15% match rate

### Iteration 3: Multi-Invoice Aggregation
**Goal**: Detect cases where multiple invoices = 1 deposit

**Actions**:
1. Implement combination matching (sum of N invoices ≈ deposit amount)
2. Detect recurring patterns (same client, multiple invoices per month)
3. Flag suspicious aggregations for review
4. **Expected improvement**: +5-10% match rate

### Iteration 4: Pattern Learning from Failures
**Goal**: Discover new matching patterns from unmatched items

**Actions**:
1. Analyze unmatched invoice-deposit pairs with similar amounts
2. Identify systematic offsets (e.g., -0.15% bank fee pattern)
3. Detect partial payment patterns (e.g., 50% deposits)
4. **Expected improvement**: +5-8% match rate

### Iteration 5: Hybrid ML Scoring
**Goal**: Combine all signals with weighted scoring

**Actions**:
1. Implement multi-dimensional confidence scoring
2. Auto-tune thresholds based on verified matches
3. Use historical payment delays per client
4. **Expected improvement**: +3-5% match rate (refinement)

## Usage Patterns

### Review Matches
```php
// Get all matches from latest iteration
$sql = "SELECT * FROM cfdi_matcher_results
        WHERE iteration_id = (SELECT MAX(iteration_id) FROM cfdi_matcher_iterations)
        AND matched = 1
        ORDER BY match_confidence DESC";
$matches = ia_sqlArrayIndx($sql);
```

### Find Unmatched Invoices
```php
$sql = "SELECT * FROM cfdi_matcher_results
        WHERE iteration_id = (SELECT MAX(iteration_id) FROM cfdi_matcher_iterations)
        AND matched = 0
        ORDER BY invoice_amount DESC";
$unmatched = ia_sqlArrayIndx($sql);
```

### Get High-Confidence Matches for Auto-Linking
```php
$sql = "SELECT * FROM cfdi_matcher_results
        WHERE iteration_id = (SELECT MAX(iteration_id) FROM cfdi_matcher_iterations)
        AND matched = 1
        AND match_confidence >= 80
        AND user_verified IS NULL";  // Not yet reviewed
$auto_link_candidates = ia_sqlArrayIndx($sql);
```

## Performance Metrics

### Baseline (Iteration 1)
- **Total Processing Time**: <1 second (255 invoices × 60 deposits = 15,300 comparisons)
- **Average Confidence**: 59.48%
- **Tier Distribution**:
  - Tier 0: 0 (0%)
  - Tier 1: 0 (0%)
  - Tier 2: 7 (22.6%)
  - Tier 3: 24 (77.4%)

### Target Metrics (After Iteration 5)
- **Match Rate**: 70%+ automatic matches
- **Precision**: 95%+ accuracy on auto-applied matches
- **High-Confidence Coverage**: 80%+ of matches with ≥80% confidence
- **Processing Time**: <5 seconds for full dataset

## Sample Matches (Top 3)

### Match #1: Tier 2 - 78% confidence
- **Pattern**: amount_5pct_date_month
- **Invoice**: $259,028.00 (2024-06-09) - ASESORIA PROFESIONAL EN SEGUROS
- **Deposit**: $271,440.00 (2024-06-05)
- **Difference**: +$12,412 (+4.79%) / -3 days early
- **Notes**: Strong amount correlation, payment before invoice (advance payment?)

### Match #2: Tier 2 - 77% confidence
- **Pattern**: amount_5pct_date_month
- **Invoice**: $52,200.00 (2024-01-15) - COPPEL
- **Deposit**: $49,846.22 (2024-01-10)
- **Difference**: -$2,354 (-4.51%) / -4 days early
- **Notes**: Possible discount or bank fees

### Match #3: Tier 2 - 74% confidence
- **Pattern**: amount_5pct_date_month
- **Invoice**: $312,233.72 (2024-06-03) - SEGUROS SURA
- **Deposit**: $310,938.00 (2024-05-22)
- **Difference**: -$1,296 (-0.41%) / -11 days early
- **Notes**: Excellent amount match, early payment

## Configuration Options

### Date Filtering
```php
// In cfdi_matcher_lamp.php
$DATE_FILTER_START = '2024-01-01';  // Only match from this date
$DATE_FILTER_END = null;            // null = no end date
```

### Confidence Thresholds
```php
$HIGH_CONFIDENCE_THRESHOLD = 80;    // Auto-link threshold
$MINIMUM_MATCH_THRESHOLD = 40;      // Minimum to log as match
```

### Tier Tolerances
Edit in `cfdi_matcher_lib.php`:

```php
// Tier 0.1: Exact amount tolerance
amounts_within_tolerance($inv, $dep, 0.01)  // ±0.01%

// Tier 1.1: Strong amount tolerance
amounts_within_tolerance($inv, $dep, 1)     // ±1%

// Tier 2.1: Probable amount tolerance
amounts_within_tolerance($inv, $dep, 5)     // ±5%

// Tier 3.1: Possible amount tolerance
amounts_within_tolerance($inv, $dep, 10)    // ±10%
```

## Troubleshooting

### No Tier 0 Matches Found
**Cause**: Bank deposit data lacks UUID references and RFC information

**Solution**:
1. Check if `cliente_rfc` field is populated in `banco_cuenta_mov`
2. Verify if bank statements include invoice UUIDs in `numero` field
3. Consider implementing text extraction from `remarks` field

### Low Match Rate (<20%)
**Cause**: Payment delays exceed current date windows or amounts differ significantly

**Solution**:
1. Increase date windows in Tier 3 (try ±90 days)
2. Add Tier 4 with wider amount tolerances (±20%)
3. Implement multi-invoice aggregation (Iteration 3)

### High Reconciliation Gap
**Cause**: Systematic amount differences (bank fees, discounts, partial payments)

**Solution**:
1. Analyze `amount_difference_percent` distribution
2. Detect common offsets (e.g., -0.15%, -5%, +3%)
3. Create specific patterns for these offsets

### False Positives (Incorrect Matches)
**Cause**: Coincidental amount/date matches between unrelated transactions

**Solution**:
1. Require client name similarity in addition to amount
2. Lower confidence scores for matches without client data
3. Flag same-amount matches on same day for manual review

## API / Integration Points

### Link Match to Banco Record
```php
// Update banco_cuenta_mov with matched invoice
function link_deposit_to_invoice($deposit_id, $invoice_id, $match_confidence) {
    ia_update('banco_cuenta_mov', [
        'banco_cuenta_mov_id' => $deposit_id,
        'factura_numero' => $invoice_id,  // or store UUID
        'verificado' => 'Si',
        'verificado_el' => date('Y-m-d H:i:s'),
        'verificado_por' => 'cfdi_matcher_system',
        'remarks' => "Auto-matched via CFDI Matcher (confidence: {$match_confidence}%)"
    ]);
}
```

### Export Matches to CSV
```php
$sql = "SELECT
    r.invoice_uuid,
    r.invoice_date,
    r.invoice_amount,
    r.invoice_client_name,
    r.deposit_date,
    r.deposit_amount,
    r.match_confidence,
    r.match_pattern,
    r.days_between_invoice_deposit,
    r.amount_difference_percent
FROM cfdi_matcher_results r
WHERE r.iteration_id = (SELECT MAX(iteration_id) FROM cfdi_matcher_iterations)
AND r.matched = 1
AND r.match_confidence >= 80
ORDER BY r.match_confidence DESC";

$matches = ia_sqlArrayIndx($sql);

header('Content-Type: text/csv');
header('Content-Disposition: attachment; filename=cfdi_matches_' . date('Y-m-d') . '.csv');
$output = fopen('php://output', 'w');
fputcsv($output, array_keys($matches[0]));
foreach ($matches as $row) {
    fputcsv($output, $row);
}
```

## Credits

- **Architecture**: Based on PMS Matcher (Cloudbeds/Hostify reconciliation)
- **Author**: Claude Code (Filemón Prime AI Assistant)
- **Date Created**: 2026-01-14
- **License**: Internal use - Quantix project

## Changelog

### Version 1.0 (2026-01-14)
- ✅ Initial implementation with 3-tier matching
- ✅ Database schema (4 tables)
- ✅ Core matching library (9 matching functions)
- ✅ CLI matcher script
- ✅ Baseline test run: 12.16% match rate
- 🚧 TODO: Test runner for iterative learning
- 🚧 TODO: Web UI for manual review
- 🚧 TODO: Auto-linking functionality

### Planned Version 1.1 (Iteration 2)
- 📋 Text mining enhancement (extract clients from references)
- 📋 Client name variation lookup table
- 📋 Fuzzy client name matching improvements
- 📋 Target: 25-30% match rate

### Planned Version 1.2 (Iteration 3)
- 📋 Multi-invoice aggregation matching
- 📋 Recurring pattern detection
- 📋 Partial payment identification
- 📋 Target: 35-40% match rate

---

For questions or improvements, consult the matcher library source code or review the PMS matcher architecture for reference patterns.
