Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.19.0
USLaP Database Migration Guide
Ψ¨ΩΨ³ΩΩ Ω Ψ§ΩΩΩΩΩΩ Ψ§ΩΨ±ΩΩΨΩΩ ΩΩ°ΩΩ Ψ§ΩΨ±ΩΩΨΩΩΩ Ω
This guide provides step-by-step instructions for migrating data from the Excel master file to the full USLaP SQLite relational database. The migration creates a normalized schema designed for world-vocabulary scale (target: 25,000+ entries, 1M+ searchable objects).
Overview
The migration process:
- Creates a fresh SQLite database with the full relational schema (
uslap_lattice.db) - Reads data from structured Excel sheets (skips the consolidated echo sheet)
- Normalizes data into proper relational tables with foreign key relationships
- Registers the
extract_consonants()Python UDF for phonetic search - Generates the
word_fingerprintstable for O(log n) cluster expansion - Creates backups of any existing databases
- Verifies data integrity and foreign key constraints
Prerequisites
Required Files
USLaP_Final_Data_Consolidated_Master_v3.xlsxβ Master Excel file (in workplace root)create_uslap_db.sqlβ Complete SQLite schema (inCode_files/)migrate_to_sqlite.pyβ Migration script (inCode_files/)USLaP_Engine.pyβ Contains consonant extraction logic (inCode_files/)
Python Dependencies
pip install openpyxl
The script requires Python 3.6+ and the openpyxl library for reading Excel files. All other dependencies are in the Python standard library.
Running the Migration
Step 1: Verify File Locations
Ensure all files are in the correct locations:
USLaP workplace/
βββ USLaP_Final_Data_Consolidated_Master_v3.xlsx
βββ Code_files/
βββ create_uslap_db.sql
βββ migrate_to_sqlite.py
βββ USLaP_Engine.py
βββ [existing .db files]
Step 2: Run the Migration Script
From the USLaP workplace directory, run:
cd "/Users/mmsetubal/Documents/USLaP workplace"
python3 "Code_files/migrate_to_sqlite.py"
Important Notes:
- The script will automatically create a backup of any existing
uslap_lattice.dbfile - Migration may take 1-2 minutes depending on Excel file size
- All operations are wrapped in a transaction; on failure, the database is rolled back
Step 3: Monitor Migration Output
The script provides real-time progress:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
USLaP Migration: Excel β SQLite Relational Database
Ψ¨ΩΨ³ΩΩ
Ω Ψ§ΩΩΩΩΩΩ Ψ§ΩΨ±ΩΩΨΩΩ
ΩΩ°ΩΩ Ψ§ΩΨ±ΩΩΨΩΩΩ
Ω
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Loading Excel file: USLaP_Final_Data_Consolidated_Master_v3.xlsx
Found 62 sheets
ποΈ Creating database: Code_files/uslap_lattice.db
Executing schema...
Registering extract_consonants() UDF...
π Migrating data...
Migrating A1_ENTRIES...
Migrated 59 entries
Migrating A1_ΠΠΠΠΠ‘Π (Russian entries)...
Migrated 0 entries
[Additional sheets...]
π Migration Statistics:
ββββββββββββββββββββββββββββββββββββββββ
Total entries: 59
Total roots: 0
Child entries: 3
Word fingerprints: 6
Engine queue items: 0
π Verifying foreign key constraints...
β All foreign key constraints satisfied
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
MIGRATION COMPLETED SUCCESSFULLY
β
Database: Code_files/uslap_lattice.db
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Verification Steps
Verify Database Structure
After migration, verify all tables were created:
sqlite3 "Code_files/uslap_lattice.db" ".tables"
Expected output (30+ tables):
child_entries decay_levels operation_cycles
child_entry_links derivatives operators
cluster_cache detection_patterns operator_aliases
country_names engine_queue operation_codes
cross_refs entries phonetic_mappings
change_log events phonetic_shifts
dp_codes host_civilizations quran_refs
intel_reports languages roots
names_of_allah networks scholars
nt_codes op_codes script_corridors
qur_verification session_index sync_status
word_fingerprints
Verify Row Counts
Run the verification queries in the database:
-- Check entry counts
SELECT 'entries' as tbl, COUNT(*) FROM entries
UNION ALL SELECT 'roots', COUNT(*) FROM roots
UNION ALL SELECT 'derivatives', COUNT(*) FROM derivatives
UNION ALL SELECT 'cross_refs', COUNT(*) FROM cross_refs
UNION ALL SELECT 'child_entries', COUNT(*) FROM child_entries
UNION ALL SELECT 'word_fingerprints', COUNT(*) FROM word_fingerprints;
-- Check FK integrity
PRAGMA integrity_check;
PRAGMA foreign_key_check;
-- Check fingerprint coverage
SELECT COUNT(DISTINCT entry_id) as entries_with_fingerprints FROM word_fingerprints;
Expected Results:
entries: ~390β404 entries total across all languagesroots: ~200 unique rootsderivatives: ~632 word formscross_refs: ~155 relationshipschild_entries: ~12 operational intelligence entriesword_fingerprints: ~1,400+ entries (covers all searchable terms)
Test the UDF Function
Verify the extract_consonants() function works:
sqlite3 "Code_files/uslap_lattice.db" "SELECT extract_consonants('example')"
Expected output: xmpl
Test Basic Queries
-- Get entries with high confidence scores
SELECT entry_id, en_term, score FROM entries WHERE score >= 8 ORDER BY score DESC LIMIT 10;
-- Test phonetic search via word_fingerprints
SELECT wf.raw_word, wf.consonant_skeleton, e.en_term, e.score
FROM word_fingerprints wf
LEFT JOIN entries e ON wf.entry_id = e.entry_id
WHERE wf.consonant_skeleton = extract_consonants('test')
LIMIT 5;
-- View operational intelligence (CHILD schema)
SELECT child_id, shell_name, operation_role, parent_op FROM child_entries;
Database Schema Details
Critical Tables for Engine Operations
1. word_fingerprints β Phonetic Search Index
The most critical table for performance. Enables O(log n) cluster expansion via the composite index on (consonant_skeleton, language).
Triggers: Automatic population when entries/derivatives/child entries are inserted or updated.
Index: idx_fingerprints_lookup enables instant phonetic matching.
2. engine_queue β Write Conflict Prevention
Prevents direct writes from the engine to core tables. All proposed changes go through this queue for user approval via the Oversight Dashboard.
Purpose: Maintains Excel as the primary write interface while enabling engine proposals.
3. session_index β Engine Session Tracking
Tracks every engine run with performance metrics and error logging.
4. child_entries & child_entry_links β Operational Intelligence
CHILD schema integration links operational intelligence (SLV, SQLB, RUS entries) with main A1 entries.
Foreign Key Enforcement
Foreign keys are strictly enforced (PRAGMA foreign_keys = ON). The migration script temporarily disables them during data insertion to avoid constraint violations, then re-enables and verifies all constraints.
Python UDF: extract_consonants()
Registration
The UDF is automatically registered on every database connection via:
conn.create_function("extract_consonants", 1, extract_consonants)
CRITICAL: Must be registered BEFORE any INSERT operations or trigger execution.
Implementation
The function extracts consonant skeletons from words:
- Removes vowels (a, e, i, o, u)
- Handles digraphs (sh, ch, gh, th, ph, wh, qu) as single units
- Normalizes to lowercase
- Returns empty string for null/empty input
Source: Logic matches PhoneticReversal.extract_consonants() in USLaP_Engine.py.
SQL Triggers Using the UDF
The schema includes triggers that automatically populate word_fingerprints:
CREATE TRIGGER update_fingerprints_on_entry_insert
AFTER INSERT ON entries
BEGIN
INSERT INTO word_fingerprints (entry_id, language, raw_word, consonant_skeleton)
SELECT NEW.entry_id, 'en', NEW.en_term, extract_consonants(NEW.en_term)
WHERE NEW.en_term IS NOT NULL AND NEW.en_term != '';
-- ... similar for ru_term, fa_term, ar_word
END;
Excel β Database Sync Strategy
Current Architecture
- Excel is primary write interface: All user-facing writes go through Excel
- Database is read/query layer: Engine reads from database, proposes changes via queue
- Sync direction: Excel β Database (one-way during migration)
Maintaining Sync
After migration, keep databases in sync:
- Engine proposals: When USLaP_Engine.py detects new patterns, it writes to
engine_queue - User approval: User reviews proposals in Oversight Dashboard (
USLaP_Oversight_Dashboard.html) - Approved changes: User applies approved changes to Excel manually
- Re-sync: Run migration script periodically to update database with Excel changes
Migration Script Updates
The migrate_to_sqlite.py script can be re-run at any time. It will:
- Create timestamped backup of existing database
- Start fresh with current Excel data
- Preserve any
engine_queueitems that haven't been processed
Switching USLaP_Engine.py to SQLite Reads
Current State
USLaP_Engine.py currently reads directly from Excel via openpyxl.
Target State
Update USLaP_Engine.py to use the database access layer (db_access_layer.py) for:
- Entry lookups (instead of reading Excel sheets)
- Phonetic search (using
word_fingerprintstable) - Cluster expansion (O(log n) via indexed searches)
- Queue operations (proposing changes via
engine_queue)
Implementation Steps
- Import
db_access_layermodule - Replace Excel reading with database queries:
# Old: reading from Excel # New: using database from db_access_layer import search_word, PhoneticSearchOperations results = search_word("example") similar = PhoneticSearchOperations.find_similar_words("example", conn) - Update cluster expansion to use
word_fingerprintsindex - Route all proposed changes through
engine_queueinstead of direct writes
Performance Benefits
- Phonetic search: O(log n) vs O(n) linear scan
- Cluster expansion: Instant via pre-computed fingerprints
- Memory usage: Database queries vs loading entire Excel file
- Concurrency: Multiple engine sessions can query simultaneously
Troubleshooting
Common Issues
1. "Excel file not found"
Solution: Ensure USLaP_Final_Data_Consolidated_Master_v3.xlsx is in the workplace root directory.
2. "Schema file not found"
Solution: Ensure create_uslap_db.sql is in Code_files/ directory.
3. Foreign key constraint violations
Solution: The migration script temporarily disables foreign keys during insertion. If errors persist:
sqlite3 "Code_files/uslap_lattice.db" "PRAGMA foreign_key_check"
Check for circular dependencies or missing reference data.
4. UDF not registered
Solution: Ensure extract_consonants() is registered before any inserts. The migration script does this automatically. For custom connections, use:
from migrate_to_sqlite import extract_consonants
conn.create_function("extract_consonants", 1, extract_consonants)
5. Low row counts after migration
Possible causes:
- Excel sheet names don't match expected names
- Header row detection failed
- Data is in unexpected format
Debug: Run the migration script with additional print statements or examine the Excel sheet structure.
Recovery Procedures
Database Corruption
If the database becomes corrupted:
- Restore from latest backup in
Code_files/backups/ - Or re-run migration script (creates fresh database)
Failed Migration
If migration fails mid-process:
- Script automatically rolls back transaction
- Old database remains unchanged (if backup was created)
- Check error output for specific issue
- Fix underlying problem (Excel format, disk space, permissions)
- Re-run migration
Data Loss Prevention
- Migration always creates timestamped backups
- Excel master file remains unchanged (read-only during migration)
- Transaction rollback on any error
Performance Optimization
Index Usage
The schema includes optimal indexes for:
- Phonetic search:
idx_fingerprints_lookupon(consonant_skeleton, language) - Root-based queries:
idx_entries_rootonentries(root_id) - Score sorting:
idx_entries_scoreonentries(score DESC) - Full-text search: FTS5 virtual table
entries_fts
Query Patterns
For best performance:
- Use
word_fingerprintsfor phonetic searches - Use
entries_ftsfor full-text keyword searches - Use
cluster_cachefor repeated expansion of same roots - Limit results with
LIMITclauses for UI responsiveness
Scaling Considerations
The schema is designed for 1M+ searchable objects. At that scale:
- Consider increasing SQLite cache size:
PRAGMA cache_size = -2000;(2GB) - Use WAL mode for concurrent reads:
PRAGMA journal_mode = WAL; - Regular VACUUM to maintain performance:
VACUUM;
Appendix A: Migration Script Details
Sheets Migrated
The script reads from these structured sheets only:
A1_ENTRIESβ English entriesA1_ΠΠΠΠΠ‘Πβ Russian entriesPERSIAN_A1_MADΔKHILβ Persian entriesBITIG_A1_ENTRIESβ ORIG2/Turkic entriesCHILD_SCHEMAβ Operational intelligenceA4_DERIVATIVESβ Word formsA5_CROSS_REFSβ Entry relationshipsA3_QURAN_REFSβ Verse referencesM1_PHONETIC_SHIFTSβ Phonetic mechanismM2_DETECTION_PATTERNSβ Detection patternsM4_NETWORKSβ Network definitionsM3_SCHOLARSβ Scholar biographiesM5_QUR_VERIFICATIONβ Qur'an verification
Sheets Skipped
EXCEL_DATA_CONSOLIDATEDβ Echo sheet (not a primary source)- Various protocol, correction, and warning sheets
Data Flow
- Read Excel sheet β Clean column names β Map to schema β Insert
- Extract unique roots from entries β Create
rootstable entries - Triggers automatically create
word_fingerprints - Verify foreign key integrity
- Commit transaction
Appendix B: Database Access Layer
The db_access_layer.py module provides:
DatabaseConnectionβ Context manager for connectionsEntryOperations,RootOperationsβ CRUD operationsPhoneticSearchOperationsβ O(log n) cluster expansionEngineQueueOperations,SessionOperationsβ Engine controlAnalyticsOperationsβ Statistics and analysis- High-level API functions:
search_word(),add_new_entry(),run_engine_session()
Example Usage
from db_access_layer import search_word, get_connection, EntryOperations
# High-level search
results = search_word("example")
print(f"Found {len(results['exact_matches'])} exact matches")
# Direct operations
with get_connection() as conn:
entries = EntryOperations.get_high_score_entries(conn, min_score=8)
print(f"High-score entries: {len(entries)}")
Support
For issues with migration:
- Check error messages in console output
- Verify file permissions and locations
- Ensure Excel file isn't open in another program
- Check Python version and openpyxl installation
To report bugs or request enhancements, use the project's issue tracking system.
ΩΩΨ§ΩΩΩΩΩΩ Ψ£ΩΨΉΩΩΩΩ Ω