Documentation Index β Spatial-BEATs Analysis & Reference
π Quick Navigation
New to the codebase?
β Start here: SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md (5 min read)
Debugging a train/validation gap in DOA?
β Go here: doa_train_valid_gap_analysis.md β Part 6 + Part 8
Need detailed architecture reference?
β Read here: SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md (deep dive, 30 min)
Planning experiments (v11 series)?
β Use: 0427_v11_series.md + SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md (v11 section)
Understanding specific component (e.g., SourceQueryDecoder)?
β Use: Search in SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md Part 2 + Appendix for line numbers
π Document Reference Table
| Document | Lines | Size | Best For | Read Time |
|---|---|---|---|---|
| ANALYSIS_COMPLETION_SUMMARY (this index) | 150 | 6KB | Overview + navigation | 5 min |
| SPATIAL_FRAMEWORKS_QUICK_REFERENCE | 192 | 7KB | Quick lookup, practitioner guide | 5-10 min |
| SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS | 724 | 28KB | Deep technical reference | 30-45 min |
| doa_train_valid_gap_analysis | 434 | 19KB | Diagnostics + fixes | 20-30 min |
| 0427_v11_series | 185 | 13KB | Experimental design (v11a/b/c/d) | 15 min |
| spatial_beats_ov123_frame_routes | 512 | 22KB | Routes A/B/C architecture | 25 min |
| spatial_beats_training_overview | 420 | 15KB | Training pipeline + presets | 20 min |
π― Use Case Lookup
"I need to understand the DOA train/val gap"
- Read:
doa_train_valid_gap_analysis.mdExecutive Summary (2 min) - Identify: Which of 6 mechanisms applies to your case (Part 6)
- Fix: Follow priority order in Part 8
- Reference: Code locations in Appendix
Expected outcome: Root cause identified + fix strategy
"I'm new and want to understand the architecture"
- Read:
SPATIAL_FRAMEWORKS_QUICK_REFERENCE.mdSections 1-3 (5 min) - Read:
SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.mdPart 1 + Part 2 (15 min) - Reference: Code locations in Appendix for specific functions
- Cross-check:
spatial_beats_ov123_frame_routes.mdfor Routes A/B/C details
Expected outcome: High-level understanding + ability to navigate code
"I want to run v11a experiment"
- Read:
0427_v11_series.mdSection 2.2 (v11a) (5 min) - Check:
SPATIAL_FRAMEWORKS_QUICK_REFERENCE.mdv11 section for shell script - Read: Part 4 (verification method) for how to evaluate results
- Reference: Appendix in
doa_train_valid_gap_analysis.mdfor code line numbers if modifying
Expected outcome: Experiment ready to launch, understanding of what to expect
"What are all the spatial frameworks in this codebase?"
- Read:
SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.mdPart 1 (5 min) - Summary: Four frameworks implemented (Spatial-AST, DCASE SELD, EINV2, DETR-slots)
- Reference: Part 2 shows how each is implemented as Routes A/B/C
Expected outcome: Framework inventory + where each is implemented
"How do I compare Routes A/B/C?"
- Go to:
SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.mdPart 2 (15 min) - Check: Comparison table in
SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md - Deep dive:
spatial_beats_ov123_frame_routes.mdfor architectural details
Expected outcome: Understanding of paradigm differences, when to use each
"What changed from v7 to v11?"
- Read:
SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.mdPart 3 (10 min) - Reference:
SPATIAL_FRAMEWORKS_QUICK_REFERENCE.mdversion series for quick compare - Deep dive:
doa_train_valid_gap_analysis.mdPart 3 for v9/v10 details - Experimental:
0427_v11_series.mdSection 1 for v11 rationale
Expected outcome: Version history + innovation tracking
"Where exactly is the direction head loss computed?"
- Go to:
SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.mdAppendix (search "direction loss") - Result:
spatial_loss.py:1562-1565 - Read: Part of
doa_train_valid_gap_analysis.mdPart 2.3 for context
Expected outcome: Exact code location + understanding of loss formulation
π Code Navigation Quick Reference
For Each Major Component
| Component | Primary Ref | Backup Ref | Concept |
|---|---|---|---|
| LocalSpatialEncoder | ANALYSIS Part 6 | QUICK_REF "Key locations" | 7-channel FOA β spatial features |
| SourceQueryDecoder | ROUTES p.20-30 | ANALYSIS Part 2.2b | K track queries β per-frame features |
| FrameTrackPredictionHeads | QUICK_REF Appendix | ANALYSIS Part 2.2b | Predicts act/class/dir/dist per frame |
| Hungarian Matching | DOA_GAP Part 2.2 | ANALYSIS Part 4 | How sources get assigned to slots/queries |
| ClassHeadSpectralDemixer | ANALYSIS Part 3 | QUICK_REF "v9 baseline" | Breaks frequency pooling bottleneck |
| ACCDOA heads | ROUTES p.40+ | ANALYSIS Part 1.2 | Route C: per-class 3D vectors |
| SpecAugment | DOA_GAP Part 1.2 | QUICK_REF "Training" | Spectral masking (train-only!) |
π Experiment Planning Matrix
To understand which experiment tests what:
Use: 0427_v11_series.md Section 2 (detailed specifications)
SPATIAL_FRAMEWORKS_QUICK_REFERENCE.mdv11 table (quick reference)doa_train_valid_gap_analysis.mdPart 6 (root causes)
| Problem | Experiment | Mechanism Tested | Expected Impact | Doc Reference |
|---|---|---|---|---|
| ov2 angle errors (73.9%) | v11a | DOA demixer | β 5pp+ | 0427_v11_series.md:18-40 |
| ov2/ov3 angles | v11b | IV signal path | Compare vs v11a | 0427_v11_series.md:41-63 |
| ov3 binding (24.5%) | v11c | ACCDOA paradigm | β 5pp+ | 0427_v11_series.md:64-87 |
| ov1 ranking (37% loss) | v11d | Post-hoc decoding | β 5pp+ | 0427_v11_series.md:88-112 |
β‘ Critical Findings Summary
Three Most Important Things to Know
Zero spatial augmentation (rotations) is the #1 cause of DOA train/val gap
- Location:
doa_train_valid_gap_analysis.mdExecutive Summary + Part 6 - Impact: 40-60% of variance
- Fix: See Part 8 recommendation #1
- Location:
Three parallel routes (A/B/C) coexist in the codebase
- Route A: Per-frame slot allocation (unstable)
- Route B: Learnable track queries (production, v9)
- Route C: Per-class vectors (prototype, being tested in v11c)
- Reference:
SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.mdPart 2
v10 phase-1 freezes the direction head entirely
- Direction head gets no gradients for 10 epochs
- Then unfrozen with poor initialization
- Causes 30-40% DOA metric drop on multi-source splits
- Reference:
doa_train_valid_gap_analysis.mdPart 3.2 + Part 6
π Cross-Reference Guide
"I'm reading X, how do I find related content?"
| Reading | See Also |
|---|---|
| ROUTES page 20-30 (Route B) | ANALYSIS Part 2.2b, QUICK_REF "Routes", DOA_GAP Part 2-3 |
| ANALYSIS Part 1 (Spatial-AST) | QUICK_REF table, ANALYSIS Part 6 (code locations) |
| DOA_GAP Part 8 (fixes) | ANALYSIS Part 6 (line numbers), QUICK_REF "Training" |
| 0427_v11_series Section 2 (v11 specs) | QUICK_REF v11 table, DOA_GAP Part 6 (why these experiments) |
| TRAINING_OVERVIEW | ANALYSIS Part 5 (configs), QUICK_REF "Loss weights" |
π How Documents Were Created
All documentation created from comprehensive codebase analysis on 2026-04-27:
- β 4 spatial frameworks identified (Spatial-AST, DCASE SELD, EINV2, DETR-slots)
- β 3 parallel routes analyzed with full architecture specifications
- β 6 mechanisms causing DOA train/val gaps discovered
- β v7βv11 experimental series mapped with root cause tracing
- β ~10,000+ lines of code reviewed and cross-referenced
- β All findings tied to exact file:line numbers
Quality Assurance:
- Code references verified with actual line numbers
- Architecture descriptions validated against source
- Experimental hypotheses cross-checked with docstrings
- Cross-document consistency checked
π Recommended Reading Order
For Different Roles
Principal Investigator / Project Lead
- This index + section on "Critical Findings" (5 min)
ANALYSIS_COMPLETION_SUMMARY.md(10 min)0427_v11_series.mdSection 1 (diagnostics review) + Section 5 (order) (10 min) β Decision: Approve v11 experiments? (Total: 25 min)
Researcher / Experimenter
SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md(10 min)0427_v11_series.mdfull document (15 min)SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.mdPart 2 + 3 (25 min) β Ready: Design and run experiments (Total: 50 min)
New Contributor / Intern
SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md(10 min)SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.mdPart 1 + 2 (25 min)- Pick a component (e.g., "LocalSpatialEncoder") β find in Part 6 β read code
spatial_beats_training_overview.mdorspatial_beats_coding_guide.mdas needed β Goal: Understand codebase (Total: 1-2 hours)
Debugging Train/Val Gap
doa_train_valid_gap_analysis.mdExecutive Summary + Part 6 (10 min)- Part 7 (diagnostics) β check your logs (10 min)
- Part 8 (fixes) β pick priority #1-3 (5 min)
- Appendix β get code locations (5 min) β Goal: Root cause + fix strategy (Total: 30 min)
π FAQ About Documentation
Q: Where do I find the code for Route A, B, or C?
A: See SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md Part 2.1/2.2/2.3, each has exact file:line references in Part 6 Appendix.
Q: What's the difference between v9 and v10?
A: doa_train_valid_gap_analysis.md Part 3.1 vs 3.2, or quick summary in SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md.
Q: Should I implement fix #1, #2, or #3?
A: Depends on your problem. See doa_train_valid_gap_analysis.md Part 6 β rank root causes by severity against your gap size.
Q: How long will v11 experiments take?
A: ~14 days total. See 0427_v11_series.md Section 5 for recommended order (serial vs parallel).
Q: Can I run v11 without understanding everything?
A: Yes! Copy the shell script from SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md, follow Part 4 (verification method) in 0427_v11_series.md.
β Checklist: What You Can Do Now
After reading appropriate documentation:
- Understand what spatial frameworks exist in codebase
- Identify which route (A/B/C) solves your problem
- Diagnose your train/val gap (Part 6 in DOA_GAP)
- Plan an experiment (v11 specs + order)
- Find exact code locations (Appendix tables)
- Understand loss weight patterns (QUICK_REF tables)
- Know when to hot-start from which checkpoint
- Compare validation metrics across routes
π License & Citation
These documents are analysis artifacts for internal research use. They synthesize information from:
- Source code comments and docstrings
- Configuration file specifications
- DCASE challenge documentation (officially referenced in code)
- Research paper citations in docstrings
Last Updated: 2026-04-27
Analysis Completed: Yes
Ready for Use: Yes
Maintenance: Update after v11 experiments complete
For questions or clarifications, refer to exact file:line citations in the appendices of technical documents.