File size: 4,390 Bytes
61d29fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
sidebar_position: 3
---

# File Migration to Events Naming Convention

This guide shows how to use the migration script to rename old meeting/contact files to the new events_ naming convention.

## Quick Start

```bash
# 1. Dry run to see what would be renamed (safe, no changes)
python scripts/migrate_to_events_naming.py --dry-run

# 2. Perform the migration WITH backups (recommended)
python scripts/migrate_to_events_naming.py

# 3. Skip backups if you already have external backups
python scripts/migrate_to_events_naming.py --no-backup

# 4. Clean up backup directories after verifying migration
python scripts/migrate_to_events_naming.py --cleanup-backups
```

## Migration Map

The script automatically renames:

| Old Name | New Name |
|----------|----------|
| `meetings.parquet` | `events.parquet` |
| `meetings_calendar.parquet` | `events.parquet` |
| `meetings_transcripts.parquet` | `event_documents.parquet` |
| `meetings_topics.parquet` | `event_agenda_items.parquet` |
| `meetings_demographics.parquet` | `event_participants.parquet` |
| `meetings_decisions.parquet` | `event_bills.parquet` |
| `contacts_meeting_attendance.parquet` | `event_participants.parquet` |
| `events_events.parquet` | `events.parquet` |
| `events_event_documents.parquet` | `event_documents.parquet` |
| `events_event_participants.parquet` | `event_participants.parquet` |
| `events_event_agenda_items.parquet` | `event_agenda_items.parquet` |
| `events_event_bills.parquet` | `event_bills.parquet` |
| `events_event_media.parquet` | `event_media.parquet` |

## Options

### `--dry-run`
Show what would be renamed without making changes:
```bash
python scripts/migrate_to_events_naming.py --dry-run
```

### `--no-backup`
Skip creating backups (NOT recommended unless you have external backups):
```bash
python scripts/migrate_to_events_naming.py --no-backup
```

### `--cleanup-backups`
Remove all `.migration_backup/` directories after verifying the migration:
```bash
# Dry run to see what would be deleted
python scripts/migrate_to_events_naming.py --cleanup-backups --dry-run

# Actually delete backups (will prompt for confirmation)
python scripts/migrate_to_events_naming.py --cleanup-backups
```

### `--directory`
Specify a different directory to scan (default: `data/gold`):
```bash
python scripts/migrate_to_events_naming.py --directory data/gold/states/AL
```

## Safe Migration Process

1. **Verify current files:**
   ```bash
   find data/gold -name "*.parquet" -type f | sort
   ```

2. **Run dry-run to preview changes:**
   ```bash
   python scripts/migrate_to_events_naming.py --dry-run
   ```

3. **Perform migration with backups:**
   ```bash
   python scripts/migrate_to_events_naming.py
   ```
   This creates backups in `.migration_backup/` directories (automatically gitignored).

4. **Verify the migration worked:**
   ```bash
   # Check new files exist
   find data/gold -name "events_*.parquet" -type f | sort
   
   # Check the API still works
   cd api && uvicorn main:app --reload
   ```

5. **Clean up backups (after verification):**
   ```bash
   python scripts/migrate_to_events_naming.py --cleanup-backups
   ```

## Backup Location

Backups are stored in `.migration_backup/` directories next to the original files:
```
data/gold/states/AL/
β”œβ”€β”€ events_events.parquet          # New file
└── .migration_backup/
    └── meetings_20260429_153022.parquet  # Backup with timestamp
```

These directories are automatically ignored by git (see `.gitignore`).

## Troubleshooting

### "Target already exists"
If a new-named file already exists, the script will skip that file. You'll need to manually resolve:
```bash
# Option 1: Delete the old file if new one is correct
rm data/gold/states/AL/meetings.parquet

# Option 2: Compare and merge if needed
python -c "import pandas as pd; print(pd.read_parquet('old.parquet').equals(pd.read_parquet('new.parquet')))"
```

### "No files found"
If the script finds no files to rename, either:
- Files are already using new naming βœ…
- You're scanning the wrong directory (use `--directory`)
- Files don't match the expected names

## Reverting Migration

If you need to revert (and backups still exist):
```bash
# Restore from backups manually
cd data/gold/states/AL/.migration_backup
for f in *.parquet; do
    original=$(echo $f | sed 's/_[0-9]\{8\}_[0-9]\{6\}//')
    cp "$f" "../$original"
done
```