Spatial-BEATs / docs /00_START_HERE.md
dieKarotte's picture
Add files using upload-large-folder tool
bf04039 verified
|
Raw
History Blame Contribute Delete
8.1 kB
# πŸš€ START HERE β€” Spatial-BEATs Documentation Guide
**Welcome!** You have access to comprehensive analysis of the Spatial-BEATs codebase. This guide will direct you to exactly what you need.
---
## ⚑ Quick Pick Your Task
### "I have 5 minutes"
β†’ Read: [`SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md`](SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md)
### "I have 15 minutes"
β†’ Read: [`README_DOCUMENTATION_INDEX.md`](README_DOCUMENTATION_INDEX.md) then [`ANALYSIS_COMPLETION_SUMMARY.md`](ANALYSIS_COMPLETION_SUMMARY.md)
### "I have 30 minutes"
β†’ Choose one:
- **New to codebase?** Read: Part 1 of [`SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md`](SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md)
- **Debugging DOA gap?** Read: [`doa_train_valid_gap_analysis.md`](doa_train_valid_gap_analysis.md) Executive Summary
- **Planning experiments?** Read: [`0427_v11_series.md`](0427_v11_series.md) Section 1-2
### "I have 1-2 hours"
β†’ Full reading path for your role:
- **Researcher**: QUICK_REF β†’ 0427_v11_series.md β†’ ANALYSIS Part 2-3
- **Contributor**: QUICK_REF β†’ ANALYSIS Part 1-2 β†’ Pick component β†’ read code
- **Investigator**: DOA_GAP Executive β†’ Part 6 β†’ Part 8 β†’ Appendix
---
## πŸ“š The Five Documents
### 1. **README_DOCUMENTATION_INDEX.md**
🏠 **Navigation hub** β€” Where to find what
- Use case lookup (choose your problem)
- Code component quick reference
- Reading order for different roles
- Cross-reference guide
**πŸ‘‰ Read this first if**: You're not sure where to start
---
### 2. **SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md**
⚑ **Practitioner's card** β€” Fast lookup
- Framework table (4 frameworks, 1 page)
- Route A/B/C comparison
- Version series highlights
- Code locations by component
- Loss weight patterns
- When to use each configuration
**πŸ‘‰ Read this for**: Quick answers, practitioner reference
---
### 3. **SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md**
πŸ“– **Deep technical reference** β€” Architecture bible
- Part 1: Four spatial frameworks (Spatial-AST, DCASE SELD, EINV2, DETR)
- Part 2: Routes A/B/C with full specifications
- Part 3: Version evolution (v7β†’v11)
- Part 4-10: Implementation, configs, metrics, future work
- Appendix: Code reference table
**πŸ‘‰ Read this for**: Deep understanding, architecture details, code paths
---
### 4. **doa_train_valid_gap_analysis.md**
πŸ” **Diagnostic & fix guide** β€” Root cause analysis
- Executive Summary: 6 critical mechanisms
- Part 1: Data pipeline analysis
- Part 2: Loss computation asymmetry
- Part 3: Training configuration (v9/v10)
- Part 4: Validation metrics
- **Part 6: Root causes ranked by severity**
- **Part 7: Diagnostic checklist**
- **Part 8: Recommended fixes (prioritized)**
- Appendix: Code reference with exact line numbers
**πŸ‘‰ Read this for**: Debugging train/val gaps, understanding root causes
---
### 5. **ANALYSIS_COMPLETION_SUMMARY.md**
πŸ“‹ **Executive overview** β€” What was found
- Deliverables summary (5 docs, 1,883 lines)
- Key findings (frameworks, routes, v11 series)
- Next steps (immediate vs experimental)
- How to use documents
- Verification checklist
**πŸ‘‰ Read this for**: Overview, decision-making, what comes next
---
## 🎯 Choose Your Path
### Path 1: "I want to understand the architecture (60 min)"
1. SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md (5 min)
2. SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md Part 1-2 (25 min)
3. Pick a component from Part 6 Appendix, find in code (20 min)
4. spatial_beats_ov123_frame_routes.md if curious (10 min)
**Outcome**: Can navigate codebase, understand paradigms, modify code confidently
---
### Path 2: "I need to debug a train/val gap (30 min)"
1. doa_train_valid_gap_analysis.md Executive Summary (2 min)
2. Part 6: Check which mechanisms apply to your situation (5 min)
3. Part 7: Diagnosticsβ€”check your logs (10 min)
4. Part 8: Pick a fix priority (5 min)
5. Appendix: Get code locations (2 min)
6. SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md if you need to modify (optional)
**Outcome**: Root cause identified, fix strategy chosen, code locations ready
---
### Path 3: "I want to run an experiment (v11 series) (45 min)"
1. SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md (5 min)
2. 0427_v11_series.md Section 1-2 (15 min)
3. SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md Part 3 (15 min)
4. 0427_v11_series.md Part 4 (verification method) (5 min)
5. Copy shell script from QUICK_REF (5 min)
**Outcome**: Experiment ready to launch, understanding of success metrics
---
### Path 4: "I'm new, I want to understand everything (2 hours)"
1. SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md (10 min)
2. SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md Part 1-3 (40 min)
3. spatial_beats_ov123_frame_routes.md (25 min)
4. spatial_beats_training_overview.md (20 min)
5. Pick component, trace through code with Part 6 references (20 min)
6. doa_train_valid_gap_analysis.md Part 6 for context (5 min)
**Outcome**: Comprehensive understanding of system, ready to contribute
---
## πŸ”‘ Key Findings at a Glance
### 4 Spatial Frameworks in Codebase
- **Spatial-AST**: Task tokens (pre-trunk)
- **DCASE SELD**: Per-class activity+DOA
- **EINV2**: Learnable track queries
- **DETR**: Per-frame K-slot allocation
### 3 Parallel Routes (A/B/C)
- **Route A**: Per-frame K-slot, per-step Hungarian
- **Route B**: Learnable queries, clip-level Hungarian (PRODUCTION v9)
- **Route C**: Per-class vectors (PROTOTYPE, v11c test)
### DOA Train/Val Gap Root Causes
1. ⚠️⚠️⚠️ **ZERO spatial augmentation (rotations)** β€” 40-60% of variance
2. ⚠️⚠️ **SpecAugment train-only** β€” 10-20% variance
3. ⚠️⚠️ **v10 freezes direction head** β€” 30-40% on multi-source
4. ⚠️ **Regression sensitivity** β€” 5-15% variance
5. ⚠️ **Detached prediction asymmetry** β€” 2-5% variance
### v11 Experiments (Parallel Runs)
- **v11a**: DOA demixer β†’ ov2 angles ↓ 5pp+
- **v11b**: LocalSpatial pre-pool β†’ test IV necessity
- **v11c**: ACCDOA paradigm β†’ ov3 binding ↓ 5pp+
- **v11d**: Post-hoc calibration β†’ ov1 ranking ↑ 5pp+
---
## πŸ“ž FAQ
**Q: Where do I find the direction head loss?**
A: `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md` Appendix β†’ search "direction loss" β†’ `spatial_loss.py:1562-1565`
**Q: What's the difference between routes?**
A: Compare table in `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` or detailed Part 2 of `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md`
**Q: Should I implement fix #1, #2, or #3?**
A: Read `doa_train_valid_gap_analysis.md` Part 6, pick based on your gap size and risk tolerance.
**Q: How do I run v11a?**
A: Shell script in `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` v11 section + spec in `0427_v11_series.md` Section 2.2
**Q: I'm stuck on a component. Where's the code?**
A: `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md` Part 6 has complete reference table with file:line for every component.
---
## 🎁 You Now Have
βœ… **Navigation guide** for all documents
βœ… **Quick reference card** with all the essentials
βœ… **Architecture bible** with code paths
βœ… **Diagnostic guide** for train/val gaps
βœ… **Experimental specifications** for v11 series
βœ… **Comprehensive metadata** (1,883 lines, 77KB)
βœ… **All findings tied to exact code locations**
---
## πŸš€ Next Steps
1. **Choose your path above** based on how much time you have
2. **Follow the reading order** in that path
3. **Use cross-references** when you need more detail
4. **Check Appendices** for exact code locations
5. **Reference Part 6/Part 8** when implementing
---
## πŸ“Š Document Overview
| Document | Size | Time | Purpose |
|----------|------|------|---------|
| README_DOCUMENTATION_INDEX | 12KB | 5-10m | Navigation hub |
| SPATIAL_FRAMEWORKS_QUICK_REFERENCE | 7KB | 5-10m | Quick lookup |
| SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS | 28KB | 30-45m | Deep reference |
| doa_train_valid_gap_analysis | 19KB | 20-30m | Diagnostics |
| ANALYSIS_COMPLETION_SUMMARY | 11KB | 10m | Executive summary |
| **TOTAL** | **77KB** | **2-4 hours** | **Complete set** |
---
**Status**: βœ… Complete and ready for use
**Created**: 2026-04-27
**Next update**: After v11 experiments
πŸ‘‰ **Pick your path above and start reading!**