Spatial-BEATs / docs /00_START_HERE.md
dieKarotte's picture
Add files using upload-large-folder tool
bf04039 verified
|
Raw
History Blame Contribute Delete
8.1 kB

πŸš€ START HERE β€” Spatial-BEATs Documentation Guide

Welcome! You have access to comprehensive analysis of the Spatial-BEATs codebase. This guide will direct you to exactly what you need.


⚑ Quick Pick Your Task

"I have 5 minutes"

β†’ Read: SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md

"I have 15 minutes"

β†’ Read: README_DOCUMENTATION_INDEX.md then ANALYSIS_COMPLETION_SUMMARY.md

"I have 30 minutes"

β†’ Choose one:

"I have 1-2 hours"

β†’ Full reading path for your role:

  • Researcher: QUICK_REF β†’ 0427_v11_series.md β†’ ANALYSIS Part 2-3
  • Contributor: QUICK_REF β†’ ANALYSIS Part 1-2 β†’ Pick component β†’ read code
  • Investigator: DOA_GAP Executive β†’ Part 6 β†’ Part 8 β†’ Appendix

πŸ“š The Five Documents

1. README_DOCUMENTATION_INDEX.md

🏠 Navigation hub β€” Where to find what

  • Use case lookup (choose your problem)
  • Code component quick reference
  • Reading order for different roles
  • Cross-reference guide

πŸ‘‰ Read this first if: You're not sure where to start


2. SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md

⚑ Practitioner's card β€” Fast lookup

  • Framework table (4 frameworks, 1 page)
  • Route A/B/C comparison
  • Version series highlights
  • Code locations by component
  • Loss weight patterns
  • When to use each configuration

πŸ‘‰ Read this for: Quick answers, practitioner reference


3. SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md

πŸ“– Deep technical reference β€” Architecture bible

  • Part 1: Four spatial frameworks (Spatial-AST, DCASE SELD, EINV2, DETR)
  • Part 2: Routes A/B/C with full specifications
  • Part 3: Version evolution (v7β†’v11)
  • Part 4-10: Implementation, configs, metrics, future work
  • Appendix: Code reference table

πŸ‘‰ Read this for: Deep understanding, architecture details, code paths


4. doa_train_valid_gap_analysis.md

πŸ” Diagnostic & fix guide β€” Root cause analysis

  • Executive Summary: 6 critical mechanisms
  • Part 1: Data pipeline analysis
  • Part 2: Loss computation asymmetry
  • Part 3: Training configuration (v9/v10)
  • Part 4: Validation metrics
  • Part 6: Root causes ranked by severity
  • Part 7: Diagnostic checklist
  • Part 8: Recommended fixes (prioritized)
  • Appendix: Code reference with exact line numbers

πŸ‘‰ Read this for: Debugging train/val gaps, understanding root causes


5. ANALYSIS_COMPLETION_SUMMARY.md

πŸ“‹ Executive overview β€” What was found

  • Deliverables summary (5 docs, 1,883 lines)
  • Key findings (frameworks, routes, v11 series)
  • Next steps (immediate vs experimental)
  • How to use documents
  • Verification checklist

πŸ‘‰ Read this for: Overview, decision-making, what comes next


🎯 Choose Your Path

Path 1: "I want to understand the architecture (60 min)"

  1. SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md (5 min)
  2. SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md Part 1-2 (25 min)
  3. Pick a component from Part 6 Appendix, find in code (20 min)
  4. spatial_beats_ov123_frame_routes.md if curious (10 min)

Outcome: Can navigate codebase, understand paradigms, modify code confidently


Path 2: "I need to debug a train/val gap (30 min)"

  1. doa_train_valid_gap_analysis.md Executive Summary (2 min)
  2. Part 6: Check which mechanisms apply to your situation (5 min)
  3. Part 7: Diagnosticsβ€”check your logs (10 min)
  4. Part 8: Pick a fix priority (5 min)
  5. Appendix: Get code locations (2 min)
  6. SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md if you need to modify (optional)

Outcome: Root cause identified, fix strategy chosen, code locations ready


Path 3: "I want to run an experiment (v11 series) (45 min)"

  1. SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md (5 min)
  2. 0427_v11_series.md Section 1-2 (15 min)
  3. SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md Part 3 (15 min)
  4. 0427_v11_series.md Part 4 (verification method) (5 min)
  5. Copy shell script from QUICK_REF (5 min)

Outcome: Experiment ready to launch, understanding of success metrics


Path 4: "I'm new, I want to understand everything (2 hours)"

  1. SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md (10 min)
  2. SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md Part 1-3 (40 min)
  3. spatial_beats_ov123_frame_routes.md (25 min)
  4. spatial_beats_training_overview.md (20 min)
  5. Pick component, trace through code with Part 6 references (20 min)
  6. doa_train_valid_gap_analysis.md Part 6 for context (5 min)

Outcome: Comprehensive understanding of system, ready to contribute


πŸ”‘ Key Findings at a Glance

4 Spatial Frameworks in Codebase

  • Spatial-AST: Task tokens (pre-trunk)
  • DCASE SELD: Per-class activity+DOA
  • EINV2: Learnable track queries
  • DETR: Per-frame K-slot allocation

3 Parallel Routes (A/B/C)

  • Route A: Per-frame K-slot, per-step Hungarian
  • Route B: Learnable queries, clip-level Hungarian (PRODUCTION v9)
  • Route C: Per-class vectors (PROTOTYPE, v11c test)

DOA Train/Val Gap Root Causes

  1. ⚠️⚠️⚠️ ZERO spatial augmentation (rotations) β€” 40-60% of variance
  2. ⚠️⚠️ SpecAugment train-only β€” 10-20% variance
  3. ⚠️⚠️ v10 freezes direction head β€” 30-40% on multi-source
  4. ⚠️ Regression sensitivity β€” 5-15% variance
  5. ⚠️ Detached prediction asymmetry β€” 2-5% variance

v11 Experiments (Parallel Runs)

  • v11a: DOA demixer β†’ ov2 angles ↓ 5pp+
  • v11b: LocalSpatial pre-pool β†’ test IV necessity
  • v11c: ACCDOA paradigm β†’ ov3 binding ↓ 5pp+
  • v11d: Post-hoc calibration β†’ ov1 ranking ↑ 5pp+

πŸ“ž FAQ

Q: Where do I find the direction head loss?
A: SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md Appendix β†’ search "direction loss" β†’ spatial_loss.py:1562-1565

Q: What's the difference between routes?
A: Compare table in SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md or detailed Part 2 of SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md

Q: Should I implement fix #1, #2, or #3?
A: Read doa_train_valid_gap_analysis.md Part 6, pick based on your gap size and risk tolerance.

Q: How do I run v11a?
A: Shell script in SPATIAL_FRAMEWORKS_QUICK_REFERENCE.md v11 section + spec in 0427_v11_series.md Section 2.2

Q: I'm stuck on a component. Where's the code?
A: SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS.md Part 6 has complete reference table with file:line for every component.


🎁 You Now Have

βœ… Navigation guide for all documents βœ… Quick reference card with all the essentials βœ… Architecture bible with code paths βœ… Diagnostic guide for train/val gaps βœ… Experimental specifications for v11 series βœ… Comprehensive metadata (1,883 lines, 77KB) βœ… All findings tied to exact code locations


πŸš€ Next Steps

  1. Choose your path above based on how much time you have
  2. Follow the reading order in that path
  3. Use cross-references when you need more detail
  4. Check Appendices for exact code locations
  5. Reference Part 6/Part 8 when implementing

πŸ“Š Document Overview

Document Size Time Purpose
README_DOCUMENTATION_INDEX 12KB 5-10m Navigation hub
SPATIAL_FRAMEWORKS_QUICK_REFERENCE 7KB 5-10m Quick lookup
SPATIAL_AUDIO_FRAMEWORKS_ANALYSIS 28KB 30-45m Deep reference
doa_train_valid_gap_analysis 19KB 20-30m Diagnostics
ANALYSIS_COMPLETION_SUMMARY 11KB 10m Executive summary
TOTAL 77KB 2-4 hours Complete set

Status: βœ… Complete and ready for use Created: 2026-04-27 Next update: After v11 experiments

πŸ‘‰ Pick your path above and start reading!