# Database Synchronization System - Technical Manual

## Table of Contents

1. [System Overview](#system-overview)
2. [Core Synchronization Logic](#core-synchronization-logic)
3. [Binary Search Engine](#binary-search-engine)
4. [Data Movement Pipeline](#data-movement-pipeline)
5. [Error Handling & Recovery](#error-handling--recovery)
6. [Performance Optimizations](#performance-optimizations)
7. [Configuration](#configuration)
8. [Best Practices](#best-practices)

---

## System Overview

The database synchronization system (`db-sync.lua`) is a production-grade tool designed to synchronize data between multiple database systems. It provides reliable data movement with comprehensive change detection and conflict resolution.

### Core Capabilities

- **Multi-database support**: 4D, PostgreSQL, SQLite, REST APIs
- **Schema-aware synchronization**: Handles different field types and constraints automatically
- **Binary search optimization**: Efficient algorithms for large datasets
- **Incremental synchronization**: Uses modify timestamps for partial syncs
- **Batch processing**: Configurable batch sizes for optimal performance
- **Error recovery**: Automatic fallback mechanisms and retry logic
- **Real-time monitoring**: Progress tracking and performance metrics

### Architecture

```mermaid
flowchart TD
    A[Source Database<br/>SOURCE] --> C[Sync Engine<br/>db-sync.lua]
    C --> B[Target Database<br/>TARGET]
    C --> D[Configuration<br/>db-sync.json]

    subgraph "Database Types"
        A1[4D Database]
        A2[PostgreSQL]
        A3[SQLite]
        A4[REST API]
    end

    subgraph "Core Components"
        C1[Binary Search Engine]
        C2[Data Movement Pipeline]
        C3[Change Detection Engine]
    end

    A -.-> A1
    A -.-> A2
    A -.-> A3
    A -.-> A4

    C -.-> C1
    C -.-> C2
    C -.-> C3

    style A fill:#e1f5fe,stroke:#1976d2,stroke-width:3px,color:#000
    style B fill:#f3e5f5,stroke:#388e3c,stroke-width:3px,color:#000
    style C fill:#fff3e0,stroke:#f57c00,stroke-width:4px,color:#000
    style D fill:#f3e5f5,color:#000
    style A1 fill:#e1f5fe,color:#000
    style A2 fill:#e1f5fe,color:#000
    style A3 fill:#e1f5fe,color:#000
    style A4 fill:#e1f5fe,color:#000
    style C1 fill:#fff3e0,color:#000
    style C2 fill:#fff3e0,color:#000
    style C3 fill:#fff3e0,color:#000
```

---

## Core Synchronization Logic

### Change Detection Principles

**record_id is the permanent identifier** that never changes. All other fields (product_id, timestamps, etc.) can be modified by users or automated processes.

### Three Types of Changes

1. **Add**: Record exists in source but not in target
2. **Delete**: Record exists in target but not in source
3. **Update**: Record exists in both but has field differences

### Decision Algorithm

```mermaid
flowchart TD
    A[Compare Record Counts<br/>source_count vs target_count] --> B{target_count == 0<br/>AND source_count > 0?}

    B -->|Yes| C[ADD ALL<br/>Target empty]
    B -->|No| D{source_count < target_count?}

    D -->|Yes| E[DELETE → ANALYZE<br/>Target has extras]
    D -->|No| F{source_count == target_count<br/>AND trusted_modify_id?}

    F -->|Yes| G[INCREMENTAL SYNC<br/>Trust timestamps]
    F -->|No| H[FULL COMPARE<br/>Verify all records]

    E --> I{Analyze count<br/>discrepancy}
    I --> J[Execute DELETE operation]
    J --> K[Execute ADD operation]

    G --> L[Execute ADD<br/>using timestamps]
    H --> M[Execute FULL<br/>COMPARE ALL]
    L --> N[Verify Results]

    style A fill:#e1f5fe,stroke:#1976d2,stroke-width:3px,color:#000
    style B fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
    style C fill:#c8e6c9,stroke:#388e3c,stroke-width:3px,color:#000
    style D fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
    style F fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
    style G fill:#e1f5fe,stroke:#1976d2,stroke-width:2px,color:#000
    style H fill:#fff3e0,stroke:#fbc02d,stroke-width:2px,color:#000
    style I fill:#fff3e0,color:#000
    style J fill:#ffcdd2,stroke:#d32f2f,stroke-width:3px,color:#000
    style K fill:#e8f5e8,color:#000
    style L fill:#e8f5e8,color:#000
    style M fill:#fff3e0,stroke:#fbc02d,stroke-width:2px,color:#000
    style N fill:#f3e5f5,color:#000
```

### Sync Planning Matrix

| Condition | Action | Reason |
|-----------|--------|--------|
| `target_count == 0 && source_count > 0` | **ADD ALL** | Target empty |
| `source_count < target_count` | **DELETE** | Target has extras |
| `source_count == target_count && trusted_modify_id` | **INCREMENTAL** | No count change, trust timestamps |
| `source_count == target_count && !trusted_modify_id` | **FULL COMPARE** | Verify all records |
| `source_count > target_count` | **ADD** | Source has more records |

### Incremental Sync Logic

```lua
local trustPrev = syncPrf.trust_modify_id == true and hasPrevModifyId
if trustPrev then
    -- Fast incremental sync using modify timestamps
    plan = {"add", "incremental"}
else
    -- Full comparison required
    plan = {"add", "changed"}
end
```

---

## Binary Search Engine

### Algorithm Overview

**Purpose**: Efficiently find records that differ between databases without reading entire datasets.

**Performance**: O(log n) vs O(n) for full table scans

### When Binary Search is Used

- Table size > `binary_search_min_table_size` (default: 1000)
- Count difference < `binary_search_max_diff_percent` (default: 50%)
- Not using incremental mode (`trust_modify_id != true`)

### Search Process Flow

```mermaid
flowchart TD
    A[Start Search<br/>Full Range: 1 to N] --> B[Initialize Range Stack]

    B --> C{Stack Empty?}
    C -->|Yes| X[Return Results]
    C -->|No| D[Process Range]

    D --> E{Range Size ≤ batch_size?}
    E -->|Yes| F[Process Directly<br/>Compare Records]
    E -->|No| G[Find Midpoint<br/>Split Range]

    F --> H[Add Differences<br/>to Results]
    H --> C

    G --> I[Count Records in<br/>Lower Half]
    I --> J[Count Records in<br/>Upper Half]

    J --> K{Lower Half<br/>Has differences?}
    K -->|Yes| L[Push Lower<br/>to Stack]
    K -->|No| M{Upper Half<br/>Has differences?}

    L --> M
    M -->|Yes| N[Push Upper<br/>to Stack]
    M -->|No| C
    N --> C

    style A fill:#e1f5fe,stroke:#1976d2,stroke-width:3px,color:#000
    style X fill:#c8e6c9,stroke:#388e3c,stroke-width:3px,color:#000
    style F fill:#fff3e0,stroke:#fbc02d,stroke-width:2px,color:#000
    style B fill:#f3e5f5,color:#000
    style C fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
    style D fill:#e0f2f1,color:#000
    style G fill:#e8f5e8,color:#000
    style I fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
    style J fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
    style K fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
    style L fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
    style M fill:#e8f5e8,color:#000
    style N fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
```

### Real-time Progress Display

```text
delete binary search 179196 records, search 5000 recent first, need to find 2:
2.8%↑5000 97%↓174197 49%↑87097 24%↑43547 12%↓21775 6.1%↑10886 3.0%↑5442 1.5%↓2722 0.8%↓1362 0.4%↓682 0.2%↓342
```

**Understanding the Display:**

- **Percentage**: Portion of total dataset
- **Arrow**: Direction (↑ newer, ↓ older)
- **Count**: Records in this range
- **Recent First**: Prioritizes recent records for faster results

### Result Format

Binary search returns structured results:

```lua
return {
  add = [
    {record_id: 123, sourceRec: {...}, changeType: "add"},
    {record_id: 456, sourceRec: {...}, changeType: "add"}
  ],
  modify = [
    {record_id: 789, sourceRec: {...}, targetRec: {...}, changeType: "update", changedFields: ["price", "status"]}
  ],
  delete = [
    {record_id: 321, targetRec: {...}, changeType: "delete"}
  ]
}
```

---

## Data Movement Pipeline

### Pipeline Overview

```mermaid
flowchart LR
    subgraph "READ PHASE"
        A[Read Source Records<br/>in Batches]
        B[Build Target<br/>Record Index]
        A --> C[Source Data Array]
        B --> D[Target ID Index]
    end

    subgraph "ANALYSIS PHASE"
        C --> E[Compare Records<br/>vs Target Index]
        D --> E
        E --> F[Records to ADD]
        E --> G[Records to MODIFY]
        E --> H[Records to DELETE]
    end

    subgraph "WRITE PHASE"
        F --> I[Batch INSERT<br/>Operations]
        G --> J[Batch UPDATE<br/>Operations]
        H --> K[Batch DELETE<br/>Operations]
    end

    style A fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000
    style B fill:#e3f2fd,color:#000
    style C fill:#e3f2fd,color:#000
    style D fill:#e0f2f1,color:#000
    style E fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#000
    style F fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#000
    style G fill:#fff8e1,color:#000
    style H fill:#ffebee,color:#000
    style I fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#000
    style J fill:#fff8e1,stroke:#fbc02d,stroke-width:2px,color:#000
    style K fill:#fff8e1,stroke:#f57c00,stroke-width:2px,color:#000
```

### Record Analysis Process

```lua
-- For each source record:
local recordId = sourceRecord.record_id
local targetRecord = targetIdIndex[recordId]

if targetRecord then
    -- Record exists in target
    local comparisonResult = compareRecords(sourceRecord, targetRecord)
    if comparisonResult.hasChanges then
        toModify[#toModify + 1] = {
            record_id = recordId,
            sourceRec = sourceRecord,
            targetRec = targetRecord,
            changeType = "update",
            changedFields = comparisonResult.changedFields
        }
    end
else
    -- Record not found in target
    toAdd[#toAdd + 1] = {
        record_id = recordId,
        sourceRec = sourceRecord,
        targetRec = nil,
        changeType = "add"
    }
end

-- Target-only records (detected via count analysis)
for targetId in pairs(targetIdIndex) do
    if not sourceIdIndex[targetId] then
        toDelete[#toDelete + 1] = {
            record_id = targetId,
            targetRec = targetIdIndex[targetId],
            sourceRec = nil,
            changeType = "delete"
        }
    end
end
```

---

## Error Handling & Recovery

### Error Categories

1. **Connection Errors**: Database unavailable, network issues
2. **Schema Errors**: Missing tables, incompatible field types
3. **Data Errors**: Constraint violations, invalid values
4. **Logic Errors**: Unexpected conditions, algorithm failures

### Recovery Mechanisms

#### Automatic Fallback

```lua
-- Binary search fallback to full compare
if binarySearchFailed and syncPrf.fallback_full_compare_on_mismatch then
    util.printWarning("Binary search failed, falling back to full compare")
    return syncCompare(syncRec, tbl, schema, fromId, toId, fldArr, stat, operation)
end
```

#### Batch Error Isolation

```lua
-- Continue processing other records when individual records fail
saveParam.parameter.continue_on_error = syncPrf.max_error_count
if batchError and stat.errorCount < syncPrf.max_error_count then
    stat.errorCount = stat.errorCount + 1
    continue -- Process next batch
end
```

#### Connection Recovery

```lua
-- Automatic reconnection on connection loss
if connectionLost then
    dconn.disconnectAll()
    defaultConn = nil
    util.sleep(30000) -- Wait 30 seconds
    -- Retry connection on next iteration
end
```

---

## Performance Optimizations

### 1. Binary Search Engine

- **Logarithmic Complexity**: O(log n) vs O(n) for full scans
- **Network Efficiency**: Fewer large data transfers
- **Memory Efficiency**: Process small chunks instead of entire datasets
- **Interruptible**: Can stop early when target count reached

### 2. Batch Processing

```lua
-- Configurable batch sizes per database type
batch_size: 5000           -- General operations
batch_size_4d: 1000        -- 4D database operations
delete_batch_size: 1000    -- Delete operations
delete_batch_size_4d: 500  -- 4D delete operations
id_array_batch_size: 25000 -- ID array operations
```

### 3. Smart Sorting Optimization

**Database ORDER BY is Used When**:

- LIMIT is present (requires consistent positioning)
- OFFSET is present (positional queries need consistent ordering)

**Local Lua Sorting is Used When**:

- No LIMIT or OFFSET (can sort after data retrieval)
- ID-bounded ranges (WHERE id > X AND id <= Y)
- Full table queries (retrieves all matching records)

### 4. Incremental Synchronization

When `trust_modify_id = true`:

```lua
WHERE modify_time > last_sync_modify_time
```

---

## Configuration

### Core Settings

```json
{
    "batch_size": 5000,
    "binary_search_for_add": true,
    "binary_search_min_table_size": 1000,
    "binary_search_max_diff_percent": 50,
    "trust_modify_id": false,
    "fallback_full_compare_on_mismatch": true,
    "continue_on_error": 25,
    "max_error_loop": 5
}
```

### Binary Search Specific

```json
{
    "binary_search_recent_first": 5000,
    "binary_search_read_batch": 500,
    "binary_search_max_depth": 20,
    "binary_search_timeout": 300
}
```

---

## Best Practices

### 1. Initial Configuration

Start with conservative settings:

```lua
{
    "batch_size": 1000,
    "binary_search_min_table_size": 5000,
    "trust_modify_id": false,
    "fallback_full_compare_on_migrate": true
}
```

### 2. Gradual Optimization

Scale up as performance allows:

```lua
{
    "batch_size": 5000,
    "trust_modify_id": true,
    "binary_search_recent_first": 5000
}
```

### 3. Database Indexing

Essential indexes for optimal performance:

```sql
-- Primary key index
CREATE INDEX idx_table_record_id ON table_name (record_id);

-- Modify time index for incremental sync
CREATE INDEX idx_table_modify_time ON table_name (modify_time);

-- Composite index for range queries
CREATE INDEX idx_table_id_modify ON table_name (record_id, modify_time);
```

### 4. Error Prevention

```lua
{
    "fallback_full_compare_on_mismatch": true,
    "max_error_count": 5,
    "continue_on_error": false,
    "debug_connection_change": true
}
```

---

## Conclusion

The database synchronization system provides reliable data movement with comprehensive change detection. Key success factors:

1. **Proper configuration** for your environment
2. **Adequate database indexing** for performance
3. **Regular monitoring** and maintenance
4. **Comprehensive testing** before production deployment

The system is designed for demanding enterprise environments where data consistency and performance are critical requirements.
