Spatial-BEATs / docs /README_DOCUMENTATION_INDEX.md
dieKarotte's picture
Add files using upload-large-folder tool
dd39446 verified
|
Raw
History Blame Contribute Delete
11.7 kB
# Documentation Index β€” Spatial-BEATs Analysis & Reference
## πŸ“‹ Quick Navigation
### **New to the codebase?**
β†’ Start here: [`SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md`](SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md) (5 min read)
### **Debugging a train/validation gap in DOA?**
β†’ Go here: [`doa_train_valid_gap_analysis.md`](doa_train_valid_gap_analysis.md) β†’ Part 6 + Part 8
### **Need detailed architecture reference?**
β†’ Read here: [`SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md`](SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md) (deep dive, 30 min)
### **Planning experiments (v11 series)?**
β†’ Use: [`0427_v11_series.md`](0427_v11_series.md) + `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` (v11 section)
### **Understanding specific component (e.g., SourceQueryDecoder)?**
β†’ Use: Search in `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md` Part 2 + Appendix for line numbers
---
## πŸ“š Document Reference Table
| Document | Lines | Size | Best For | Read Time |
|----------|-------|------|----------|-----------|
| **ANALYSIS_COMPLETION_SUMMARY** (this index) | 150 | 6KB | Overview + navigation | 5 min |
| **SPATIAL_FRAMEWORKS_QUICK_REFERENCE** | 192 | 7KB | Quick lookup, practitioner guide | 5-10 min |
| **SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS** | 724 | 28KB | Deep technical reference | 30-45 min |
| **doa_train_valid_gap_analysis** | 434 | 19KB | Diagnostics + fixes | 20-30 min |
| **0427_v11_series** | 185 | 13KB | Experimental design (v11a/b/c/d) | 15 min |
| **spatial_beats_ov123_frame_routes** | 512 | 22KB | Routes A/B/C architecture | 25 min |
| **spatial_beats_training_overview** | 420 | 15KB | Training pipeline + presets | 20 min |
---
## 🎯 Use Case Lookup
### "I need to understand the DOA train/val gap"
1. Read: [`doa_train_valid_gap_analysis.md`](doa_train_valid_gap_analysis.md) **Executive Summary** (2 min)
2. Identify: Which of 6 mechanisms applies to your case (Part 6)
3. Fix: Follow priority order in Part 8
4. Reference: Code locations in Appendix
**Expected outcome**: Root cause identified + fix strategy
---
### "I'm new and want to understand the architecture"
1. Read: [`SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md`](SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md) **Sections 1-3** (5 min)
2. Read: [`SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md`](SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md) **Part 1 + Part 2** (15 min)
3. Reference: Code locations in Appendix for specific functions
4. Cross-check: `spatial_beats_ov123_frame_routes.md` for Routes A/B/C details
**Expected outcome**: High-level understanding + ability to navigate code
---
### "I want to run v11a experiment"
1. Read: [`0427_v11_series.md`](0427_v11_series.md) **Section 2.2 (v11a)** (5 min)
2. Check: `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` **v11 section** for shell script
3. Read: Part 4 (verification method) for how to evaluate results
4. Reference: Appendix in `doa_train_valid_gap_analysis.md` for code line numbers if modifying
**Expected outcome**: Experiment ready to launch, understanding of what to expect
---
### "What are all the spatial frameworks in this codebase?"
1. Read: [`SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md`](SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md) **Part 1** (5 min)
2. Summary: Four frameworks implemented (Spatial-AST, DCASE SELD, EINV2, DETR-slots)
3. Reference: Part 2 shows how each is implemented as Routes A/B/C
**Expected outcome**: Framework inventory + where each is implemented
---
### "How do I compare Routes A/B/C?"
1. Go to: [`SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md`](SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md) **Part 2** (15 min)
2. Check: Comparison table in `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md`
3. Deep dive: `spatial_beats_ov123_frame_routes.md` for architectural details
**Expected outcome**: Understanding of paradigm differences, when to use each
---
### "What changed from v7 to v11?"
1. Read: [`SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md`](SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md) **Part 3** (10 min)
2. Reference: `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` **version series** for quick compare
3. Deep dive: `doa_train_valid_gap_analysis.md` **Part 3** for v9/v10 details
4. Experimental: `0427_v11_series.md` **Section 1** for v11 rationale
**Expected outcome**: Version history + innovation tracking
---
### "Where exactly is the direction head loss computed?"
1. Go to: `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md` **Appendix** (search "direction loss")
2. Result: `spatial_loss.py:1562-1565`
3. Read: Part of `doa_train_valid_gap_analysis.md` **Part 2.3** for context
**Expected outcome**: Exact code location + understanding of loss formulation
---
## πŸ” Code Navigation Quick Reference
### For Each Major Component
| Component | Primary Ref | Backup Ref | Concept |
|-----------|------------|-----------|---------|
| **LocalSpatialEncoder** | ANALYSIS Part 6 | QUICK_REF "Key locations" | 7-channel FOA β†’ spatial features |
| **SourceQueryDecoder** | ROUTES p.20-30 | ANALYSIS Part 2.2b | K track queries β†’ per-frame features |
| **FrameTrackPredictionHeads** | QUICK_REF Appendix | ANALYSIS Part 2.2b | Predicts act/class/dir/dist per frame |
| **Hungarian Matching** | DOA_GAP Part 2.2 | ANALYSIS Part 4 | How sources get assigned to slots/queries |
| **ClassHeadSpectralDemixer** | ANALYSIS Part 3 | QUICK_REF "v9 baseline" | Breaks frequency pooling bottleneck |
| **ACCDOA heads** | ROUTES p.40+ | ANALYSIS Part 1.2 | Route C: per-class 3D vectors |
| **SpecAugment** | DOA_GAP Part 1.2 | QUICK_REF "Training" | Spectral masking (train-only!) |
---
## πŸ“Š Experiment Planning Matrix
### To understand **which experiment tests what**:
Use: `0427_v11_series.md` Section 2 (detailed specifications)
+ `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` v11 table (quick reference)
+ `doa_train_valid_gap_analysis.md` Part 6 (root causes)
| Problem | Experiment | Mechanism Tested | Expected Impact | Doc Reference |
|---------|-----------|-----------------|-----------------|----------------|
| ov2 angle errors (73.9%) | v11a | DOA demixer | ↓ 5pp+ | 0427_v11_series.md:18-40 |
| ov2/ov3 angles | v11b | IV signal path | Compare vs v11a | 0427_v11_series.md:41-63 |
| ov3 binding (24.5%) | v11c | ACCDOA paradigm | ↓ 5pp+ | 0427_v11_series.md:64-87 |
| ov1 ranking (37% loss) | v11d | Post-hoc decoding | ↑ 5pp+ | 0427_v11_series.md:88-112 |
---
## ⚑ Critical Findings Summary
### Three Most Important Things to Know
1. **Zero spatial augmentation (rotations)** is the #1 cause of DOA train/val gap
- Location: `doa_train_valid_gap_analysis.md` **Executive Summary** + Part 6
- Impact: 40-60% of variance
- Fix: See Part 8 recommendation #1
2. **Three parallel routes (A/B/C) coexist in the codebase**
- Route A: Per-frame slot allocation (unstable)
- Route B: Learnable track queries (production, v9)
- Route C: Per-class vectors (prototype, being tested in v11c)
- Reference: `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md` Part 2
3. **v10 phase-1 freezes the direction head entirely**
- Direction head gets no gradients for 10 epochs
- Then unfrozen with poor initialization
- Causes 30-40% DOA metric drop on multi-source splits
- Reference: `doa_train_valid_gap_analysis.md` Part 3.2 + Part 6
---
## πŸ”— Cross-Reference Guide
### "I'm reading X, how do I find related content?"
| Reading | See Also |
|---------|----------|
| ROUTES page 20-30 (Route B) | ANALYSIS Part 2.2b, QUICK_REF "Routes", DOA_GAP Part 2-3 |
| ANALYSIS Part 1 (Spatial-AST) | QUICK_REF table, ANALYSIS Part 6 (code locations) |
| DOA_GAP Part 8 (fixes) | ANALYSIS Part 6 (line numbers), QUICK_REF "Training" |
| 0427_v11_series Section 2 (v11 specs) | QUICK_REF v11 table, DOA_GAP Part 6 (why these experiments) |
| TRAINING_OVERVIEW | ANALYSIS Part 5 (configs), QUICK_REF "Loss weights" |
---
## πŸ“ How Documents Were Created
All documentation created from **comprehensive codebase analysis** on 2026-04-27:
- βœ… 4 spatial frameworks identified (Spatial-AST, DCASE SELD, EINV2, DETR-slots)
- βœ… 3 parallel routes analyzed with full architecture specifications
- βœ… 6 mechanisms causing DOA train/val gaps discovered
- βœ… v7β†’v11 experimental series mapped with root cause tracing
- βœ… ~10,000+ lines of code reviewed and cross-referenced
- βœ… All findings tied to exact file:line numbers
**Quality Assurance**:
- Code references verified with actual line numbers
- Architecture descriptions validated against source
- Experimental hypotheses cross-checked with docstrings
- Cross-document consistency checked
---
## πŸš€ Recommended Reading Order
### **For Different Roles**
#### **Principal Investigator / Project Lead**
1. This index + section on "Critical Findings" (5 min)
2. `ANALYSIS_COMPLETION_SUMMARY.md` (10 min)
3. `0427_v11_series.md` Section 1 (diagnostics review) + Section 5 (order) (10 min)
β†’ **Decision**: Approve v11 experiments? (Total: 25 min)
#### **Researcher / Experimenter**
1. `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` (10 min)
2. `0427_v11_series.md` full document (15 min)
3. `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md` Part 2 + 3 (25 min)
β†’ **Ready**: Design and run experiments (Total: 50 min)
#### **New Contributor / Intern**
1. `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` (10 min)
2. `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md` Part 1 + 2 (25 min)
3. Pick a component (e.g., "LocalSpatialEncoder") β†’ find in Part 6 β†’ read code
4. `spatial_beats_training_overview.md` or `spatial_beats_coding_guide.md` as needed
β†’ **Goal**: Understand codebase (Total: 1-2 hours)
#### **Debugging Train/Val Gap**
1. `doa_train_valid_gap_analysis.md` Executive Summary + Part 6 (10 min)
2. Part 7 (diagnostics) β€” check your logs (10 min)
3. Part 8 (fixes) β€” pick priority #1-3 (5 min)
4. Appendix β€” get code locations (5 min)
β†’ **Goal**: Root cause + fix strategy (Total: 30 min)
---
## πŸ“ž FAQ About Documentation
**Q: Where do I find the code for Route A, B, or C?**
A: See `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md` Part 2.1/2.2/2.3, each has exact file:line references in Part 6 Appendix.
**Q: What's the difference between v9 and v10?**
A: `doa_train_valid_gap_analysis.md` Part 3.1 vs 3.2, or quick summary in `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md`.
**Q: Should I implement fix #1, #2, or #3?**
A: Depends on your problem. See `doa_train_valid_gap_analysis.md` Part 6 β€” rank root causes by severity against your gap size.
**Q: How long will v11 experiments take?**
A: ~14 days total. See `0427_v11_series.md` Section 5 for recommended order (serial vs parallel).
**Q: Can I run v11 without understanding everything?**
A: Yes! Copy the shell script from `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md`, follow Part 4 (verification method) in `0427_v11_series.md`.
---
## βœ… Checklist: What You Can Do Now
After reading appropriate documentation:
- [ ] Understand what spatial frameworks exist in codebase
- [ ] Identify which route (A/B/C) solves your problem
- [ ] Diagnose your train/val gap (Part 6 in DOA_GAP)
- [ ] Plan an experiment (v11 specs + order)
- [ ] Find exact code locations (Appendix tables)
- [ ] Understand loss weight patterns (QUICK_REF tables)
- [ ] Know when to hot-start from which checkpoint
- [ ] Compare validation metrics across routes
---
## πŸ“„ License & Citation
These documents are analysis artifacts for internal research use. They synthesize information from:
- Source code comments and docstrings
- Configuration file specifications
- DCASE challenge documentation (officially referenced in code)
- Research paper citations in docstrings
---
**Last Updated**: 2026-04-27
**Analysis Completed**: Yes
**Ready for Use**: Yes
**Maintenance**: Update after v11 experiments complete
For questions or clarifications, refer to exact file:line citations in the appendices of technical documents.