| # Documentation Index β Spatial-BEATs Analysis & Reference |
|
|
| ## π Quick Navigation |
|
|
| ### **New to the codebase?** |
| β Start here: [`SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md`](SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md) (5 min read) |
|
|
| ### **Debugging a train/validation gap in DOA?** |
| β Go here: [`doa_train_valid_gap_analysis.md`](doa_train_valid_gap_analysis.md) β Part 6 + Part 8 |
|
|
| ### **Need detailed architecture reference?** |
| β Read here: [`SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md`](SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md) (deep dive, 30 min) |
|
|
| ### **Planning experiments (v11 series)?** |
| β Use: [`0427_v11_series.md`](0427_v11_series.md) + `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` (v11 section) |
|
|
| ### **Understanding specific component (e.g., SourceQueryDecoder)?** |
| β Use: Search in `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md` Part 2 + Appendix for line numbers |
|
|
| --- |
|
|
| ## π Document Reference Table |
|
|
| | Document | Lines | Size | Best For | Read Time | |
| |----------|-------|------|----------|-----------| |
| | **ANALYSIS_COMPLETION_SUMMARY** (this index) | 150 | 6KB | Overview + navigation | 5 min | |
| | **SPATIAL_FRAMEWORKS_QUICK_REFERENCE** | 192 | 7KB | Quick lookup, practitioner guide | 5-10 min | |
| | **SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS** | 724 | 28KB | Deep technical reference | 30-45 min | |
| | **doa_train_valid_gap_analysis** | 434 | 19KB | Diagnostics + fixes | 20-30 min | |
| | **0427_v11_series** | 185 | 13KB | Experimental design (v11a/b/c/d) | 15 min | |
| | **spatial_beats_ov123_frame_routes** | 512 | 22KB | Routes A/B/C architecture | 25 min | |
| | **spatial_beats_training_overview** | 420 | 15KB | Training pipeline + presets | 20 min | |
| |
| --- |
| |
| ## π― Use Case Lookup |
| |
| ### "I need to understand the DOA train/val gap" |
| 1. Read: [`doa_train_valid_gap_analysis.md`](doa_train_valid_gap_analysis.md) **Executive Summary** (2 min) |
| 2. Identify: Which of 6 mechanisms applies to your case (Part 6) |
| 3. Fix: Follow priority order in Part 8 |
| 4. Reference: Code locations in Appendix |
| |
| **Expected outcome**: Root cause identified + fix strategy |
| |
| --- |
| |
| ### "I'm new and want to understand the architecture" |
| 1. Read: [`SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md`](SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md) **Sections 1-3** (5 min) |
| 2. Read: [`SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md`](SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md) **Part 1 + Part 2** (15 min) |
| 3. Reference: Code locations in Appendix for specific functions |
| 4. Cross-check: `spatial_beats_ov123_frame_routes.md` for Routes A/B/C details |
| |
| **Expected outcome**: High-level understanding + ability to navigate code |
| |
| --- |
| |
| ### "I want to run v11a experiment" |
| 1. Read: [`0427_v11_series.md`](0427_v11_series.md) **Section 2.2 (v11a)** (5 min) |
| 2. Check: `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` **v11 section** for shell script |
| 3. Read: Part 4 (verification method) for how to evaluate results |
| 4. Reference: Appendix in `doa_train_valid_gap_analysis.md` for code line numbers if modifying |
|
|
| **Expected outcome**: Experiment ready to launch, understanding of what to expect |
|
|
| --- |
|
|
| ### "What are all the spatial frameworks in this codebase?" |
| 1. Read: [`SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md`](SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md) **Part 1** (5 min) |
| 2. Summary: Four frameworks implemented (Spatial-AST, DCASE SELD, EINV2, DETR-slots) |
| 3. Reference: Part 2 shows how each is implemented as Routes A/B/C |
|
|
| **Expected outcome**: Framework inventory + where each is implemented |
|
|
| --- |
|
|
| ### "How do I compare Routes A/B/C?" |
| 1. Go to: [`SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md`](SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md) **Part 2** (15 min) |
| 2. Check: Comparison table in `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` |
| 3. Deep dive: `spatial_beats_ov123_frame_routes.md` for architectural details |
|
|
| **Expected outcome**: Understanding of paradigm differences, when to use each |
|
|
| --- |
|
|
| ### "What changed from v7 to v11?" |
| 1. Read: [`SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md`](SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md) **Part 3** (10 min) |
| 2. Reference: `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` **version series** for quick compare |
| 3. Deep dive: `doa_train_valid_gap_analysis.md` **Part 3** for v9/v10 details |
| 4. Experimental: `0427_v11_series.md` **Section 1** for v11 rationale |
|
|
| **Expected outcome**: Version history + innovation tracking |
|
|
| --- |
|
|
| ### "Where exactly is the direction head loss computed?" |
| 1. Go to: `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md` **Appendix** (search "direction loss") |
| 2. Result: `spatial_loss.py:1562-1565` |
| 3. Read: Part of `doa_train_valid_gap_analysis.md` **Part 2.3** for context |
|
|
| **Expected outcome**: Exact code location + understanding of loss formulation |
|
|
| --- |
|
|
| ## π Code Navigation Quick Reference |
|
|
| ### For Each Major Component |
|
|
| | Component | Primary Ref | Backup Ref | Concept | |
| |-----------|------------|-----------|---------| |
| | **LocalSpatialEncoder** | ANALYSIS Part 6 | QUICK_REF "Key locations" | 7-channel FOA β spatial features | |
| | **SourceQueryDecoder** | ROUTES p.20-30 | ANALYSIS Part 2.2b | K track queries β per-frame features | |
| | **FrameTrackPredictionHeads** | QUICK_REF Appendix | ANALYSIS Part 2.2b | Predicts act/class/dir/dist per frame | |
| | **Hungarian Matching** | DOA_GAP Part 2.2 | ANALYSIS Part 4 | How sources get assigned to slots/queries | |
| | **ClassHeadSpectralDemixer** | ANALYSIS Part 3 | QUICK_REF "v9 baseline" | Breaks frequency pooling bottleneck | |
| | **ACCDOA heads** | ROUTES p.40+ | ANALYSIS Part 1.2 | Route C: per-class 3D vectors | |
| | **SpecAugment** | DOA_GAP Part 1.2 | QUICK_REF "Training" | Spectral masking (train-only!) | |
|
|
| --- |
|
|
| ## π Experiment Planning Matrix |
|
|
| ### To understand **which experiment tests what**: |
|
|
| Use: `0427_v11_series.md` Section 2 (detailed specifications) |
| + `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` v11 table (quick reference) |
| + `doa_train_valid_gap_analysis.md` Part 6 (root causes) |
|
|
| | Problem | Experiment | Mechanism Tested | Expected Impact | Doc Reference | |
| |---------|-----------|-----------------|-----------------|----------------| |
| | ov2 angle errors (73.9%) | v11a | DOA demixer | β 5pp+ | 0427_v11_series.md:18-40 | |
| | ov2/ov3 angles | v11b | IV signal path | Compare vs v11a | 0427_v11_series.md:41-63 | |
| | ov3 binding (24.5%) | v11c | ACCDOA paradigm | β 5pp+ | 0427_v11_series.md:64-87 | |
| | ov1 ranking (37% loss) | v11d | Post-hoc decoding | β 5pp+ | 0427_v11_series.md:88-112 | |
|
|
| --- |
|
|
| ## β‘ Critical Findings Summary |
|
|
| ### Three Most Important Things to Know |
|
|
| 1. **Zero spatial augmentation (rotations)** is the #1 cause of DOA train/val gap |
| - Location: `doa_train_valid_gap_analysis.md` **Executive Summary** + Part 6 |
| - Impact: 40-60% of variance |
| - Fix: See Part 8 recommendation #1 |
|
|
| 2. **Three parallel routes (A/B/C) coexist in the codebase** |
| - Route A: Per-frame slot allocation (unstable) |
| - Route B: Learnable track queries (production, v9) |
| - Route C: Per-class vectors (prototype, being tested in v11c) |
| - Reference: `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md` Part 2 |
|
|
| 3. **v10 phase-1 freezes the direction head entirely** |
| - Direction head gets no gradients for 10 epochs |
| - Then unfrozen with poor initialization |
| - Causes 30-40% DOA metric drop on multi-source splits |
| - Reference: `doa_train_valid_gap_analysis.md` Part 3.2 + Part 6 |
|
|
| --- |
|
|
| ## π Cross-Reference Guide |
|
|
| ### "I'm reading X, how do I find related content?" |
|
|
| | Reading | See Also | |
| |---------|----------| |
| | ROUTES page 20-30 (Route B) | ANALYSIS Part 2.2b, QUICK_REF "Routes", DOA_GAP Part 2-3 | |
| | ANALYSIS Part 1 (Spatial-AST) | QUICK_REF table, ANALYSIS Part 6 (code locations) | |
| | DOA_GAP Part 8 (fixes) | ANALYSIS Part 6 (line numbers), QUICK_REF "Training" | |
| | 0427_v11_series Section 2 (v11 specs) | QUICK_REF v11 table, DOA_GAP Part 6 (why these experiments) | |
| | TRAINING_OVERVIEW | ANALYSIS Part 5 (configs), QUICK_REF "Loss weights" | |
| |
| --- |
| |
| ## π How Documents Were Created |
| |
| All documentation created from **comprehensive codebase analysis** on 2026-04-27: |
| - β
4 spatial frameworks identified (Spatial-AST, DCASE SELD, EINV2, DETR-slots) |
| - β
3 parallel routes analyzed with full architecture specifications |
| - β
6 mechanisms causing DOA train/val gaps discovered |
| - β
v7βv11 experimental series mapped with root cause tracing |
| - β
~10,000+ lines of code reviewed and cross-referenced |
| - β
All findings tied to exact file:line numbers |
| |
| **Quality Assurance**: |
| - Code references verified with actual line numbers |
| - Architecture descriptions validated against source |
| - Experimental hypotheses cross-checked with docstrings |
| - Cross-document consistency checked |
| |
| --- |
| |
| ## π Recommended Reading Order |
| |
| ### **For Different Roles** |
| |
| #### **Principal Investigator / Project Lead** |
| 1. This index + section on "Critical Findings" (5 min) |
| 2. `ANALYSIS_COMPLETION_SUMMARY.md` (10 min) |
| 3. `0427_v11_series.md` Section 1 (diagnostics review) + Section 5 (order) (10 min) |
| β **Decision**: Approve v11 experiments? (Total: 25 min) |
| |
| #### **Researcher / Experimenter** |
| 1. `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` (10 min) |
| 2. `0427_v11_series.md` full document (15 min) |
| 3. `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md` Part 2 + 3 (25 min) |
| β **Ready**: Design and run experiments (Total: 50 min) |
|
|
| #### **New Contributor / Intern** |
| 1. `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md` (10 min) |
| 2. `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md` Part 1 + 2 (25 min) |
| 3. Pick a component (e.g., "LocalSpatialEncoder") β find in Part 6 β read code |
| 4. `spatial_beats_training_overview.md` or `spatial_beats_coding_guide.md` as needed |
| β **Goal**: Understand codebase (Total: 1-2 hours) |
|
|
| #### **Debugging Train/Val Gap** |
| 1. `doa_train_valid_gap_analysis.md` Executive Summary + Part 6 (10 min) |
| 2. Part 7 (diagnostics) β check your logs (10 min) |
| 3. Part 8 (fixes) β pick priority #1-3 (5 min) |
| 4. Appendix β get code locations (5 min) |
| β **Goal**: Root cause + fix strategy (Total: 30 min) |
|
|
| --- |
|
|
| ## π FAQ About Documentation |
|
|
| **Q: Where do I find the code for Route A, B, or C?** |
| A: See `SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md` Part 2.1/2.2/2.3, each has exact file:line references in Part 6 Appendix. |
|
|
| **Q: What's the difference between v9 and v10?** |
| A: `doa_train_valid_gap_analysis.md` Part 3.1 vs 3.2, or quick summary in `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md`. |
|
|
| **Q: Should I implement fix #1, #2, or #3?** |
| A: Depends on your problem. See `doa_train_valid_gap_analysis.md` Part 6 β rank root causes by severity against your gap size. |
|
|
| **Q: How long will v11 experiments take?** |
| A: ~14 days total. See `0427_v11_series.md` Section 5 for recommended order (serial vs parallel). |
|
|
| **Q: Can I run v11 without understanding everything?** |
| A: Yes! Copy the shell script from `SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md`, follow Part 4 (verification method) in `0427_v11_series.md`. |
|
|
| --- |
|
|
| ## β
Checklist: What You Can Do Now |
|
|
| After reading appropriate documentation: |
|
|
| - [ ] Understand what spatial frameworks exist in codebase |
| - [ ] Identify which route (A/B/C) solves your problem |
| - [ ] Diagnose your train/val gap (Part 6 in DOA_GAP) |
| - [ ] Plan an experiment (v11 specs + order) |
| - [ ] Find exact code locations (Appendix tables) |
| - [ ] Understand loss weight patterns (QUICK_REF tables) |
| - [ ] Know when to hot-start from which checkpoint |
| - [ ] Compare validation metrics across routes |
|
|
| --- |
|
|
| ## π License & Citation |
|
|
| These documents are analysis artifacts for internal research use. They synthesize information from: |
| - Source code comments and docstrings |
| - Configuration file specifications |
| - DCASE challenge documentation (officially referenced in code) |
| - Research paper citations in docstrings |
|
|
| --- |
|
|
| **Last Updated**: 2026-04-27 |
| **Analysis Completed**: Yes |
| **Ready for Use**: Yes |
| **Maintenance**: Update after v11 experiments complete |
|
|
| For questions or clarifications, refer to exact file:line citations in the appendices of technical documents. |
|
|