--- license: mit tags: - autonomous-driving - trajectory-prediction - kitscenes - kinematic-bicycle-model --- # KITScenes LongTail Challenge — Solution > **Challenge**: [KIT-MRT/KITScenes-LongTail-Challenge](https://huggingface.co/spaces/KIT-MRT/KITScenes-LongTail-Challenge) > **Dataset**: [KIT-MRT/KITScenes-LongTail](https://huggingface.co/datasets/KIT-MRT/KITScenes-LongTail) > **Paper**: [arXiv:2603.23607](https://arxiv.org/abs/2603.23607) ## Approach: Few-Shot CoT Kinematic Bicycle Model Based on the best open-source method from the paper (MMS 4.24 with Gemma 3 12B), this solution uses: 1. **VLM Reasoning** — analyzes driving scenarios using past trajectory + instruction 2. **Kinematic Bicycle Model** (Kong et al. 2015) — converts action commands → 25 precise waypoints 3. **Few-Shot Chain-of-Thought** — 3 training examples guide the VLM's structured output ### Supported VLM Backends | Backend | Command | Notes | |---|---|---| | **Ollama** (local) | `--backend ollama --ollama-model qwen3.5-vl:9b` | No rate limits, free, best for batch | | **HF Inference** | `--backend hf` | Llama-4-Scout-17B via novita/groq | | **Gemini** | `--backend gemini` | Needs `GOOGLE_API_KEY`, free tier rate-limited | | **Fallback** | `--no-vlm` | Instruction-keyword heuristic, no API needed | ## Quick Start (Local with Ollama) ```bash # 1. Clone git clone https://huggingface.co/ValeraZSD/kitscenes-longtail-solution cd kitscenes-longtail-solution # 2. Install deps pip install numpy huggingface_hub pyarrow # 3. Start Ollama model (in another terminal) ollama pull qwen3.5-vl:9b # or gemma4:e4b ollama serve # 4. Run generation python scripts/generate_production.py \ --backend ollama \ --ollama-model qwen3.5-vl:9b \ --metadata data/test_metadata.json \ --output submissions/submission_ollama.jsonl \ --upload-repo ValeraZSD/kitscenes-submissions \ --upload-filename submission_ollama_v1.jsonl # 5. Validate python scripts/validate.py submissions/submission_ollama.jsonl ``` ## Pipeline Architecture ``` Input: past_trajectory (21 pts @ 5Hz) + driving_instruction ↓ Few-Shot CoT Prompt (3 training examples + query) → VLM (Ollama / HF Inference / Gemini) → XML output with 9 structured fields ↓ Parse XML → Normalize to 5 accel × 5 steer commands ↓ Kinematic Bicycle Model (Kong et al. 2015) Phase 1: 0–3s (15 steps) Phase 2: 3–5s (10 steps) → 25 waypoints in ego-centric coordinates (+x=fwd, +y=left) ↓ Output: scenario_id + future_trajectory (25×2) + reasoning (english, 9 fields) ``` ## Steering Angle Calibration Paper Table 6 values (±30° at low speed) are for instantaneous inputs. For sustained 3s phases in the bicycle model, we calibrated against expert trajectories: | Command | Paper ≤60km/h | **Calibrated** | >60km/h | |---|---|---|---| | turning left | 30° | **6°** | 0.3° | | turning slightly left | 10° | **1°** | 0.1° | | steering straight | 0° | 0° | 0° | | turning slightly right | -10° | **-1°** | -0.1° | | turning right | -30° | **-6°** | -0.3° | ## Files | File | Description | |---|---| | `scripts/generate_production.py` | Main pipeline — VLM + bicycle model + validation + upload | | `scripts/validate.py` | Standalone submission validator (mirrors challenge code) | | `data/test_metadata.json` | Pre-extracted test metadata (400 scenarios, 190KB, no images) | | `configs/action_vocabulary.json` | Action → parameter mapping with calibrated values | | `submissions/` | Generated submission JSONL files | | `requirements.txt` | Python dependencies | ## Test Data Distribution | Instruction | Count | Scenario Type | Count | |---|---|---|---| | drive straight on | 196 | intersection | 125 | | turn right | 43 | overtake/lane change | 102 | | use left lane | 34 | specifically selected | 68 | | use right lane | 30 | construction zone | 36 | | overtake truck | 25 | heavy rain | 27 | | turn left | 20 | snow & wintry mix | 23 | | u-turn | 8 | nighttime | 19 | Speed: min=0, mean=52, max=130 km/h — 65% urban (≤60), 35% highway (>60)