ValeraZSD's picture
Upload README.md
fe4fbf0 verified
---
license: mit
tags:
- autonomous-driving
- trajectory-prediction
- kitscenes
- kinematic-bicycle-model
---
# KITScenes LongTail Challenge — Solution
> **Challenge**: [KIT-MRT/KITScenes-LongTail-Challenge](https://huggingface.co/spaces/KIT-MRT/KITScenes-LongTail-Challenge)
> **Dataset**: [KIT-MRT/KITScenes-LongTail](https://huggingface.co/datasets/KIT-MRT/KITScenes-LongTail)
> **Paper**: [arXiv:2603.23607](https://arxiv.org/abs/2603.23607)
## Approach: Few-Shot CoT Kinematic Bicycle Model
Based on the best open-source method from the paper (MMS 4.24 with Gemma 3 12B), this solution uses:
1. **VLM Reasoning** — analyzes driving scenarios using past trajectory + instruction
2. **Kinematic Bicycle Model** (Kong et al. 2015) — converts action commands → 25 precise waypoints
3. **Few-Shot Chain-of-Thought** — 3 training examples guide the VLM's structured output
### Supported VLM Backends
| Backend | Command | Notes |
|---|---|---|
| **Ollama** (local) | `--backend ollama --ollama-model qwen3.5-vl:9b` | No rate limits, free, best for batch |
| **HF Inference** | `--backend hf` | Llama-4-Scout-17B via novita/groq |
| **Gemini** | `--backend gemini` | Needs `GOOGLE_API_KEY`, free tier rate-limited |
| **Fallback** | `--no-vlm` | Instruction-keyword heuristic, no API needed |
## Quick Start (Local with Ollama)
```bash
# 1. Clone
git clone https://huggingface.co/ValeraZSD/kitscenes-longtail-solution
cd kitscenes-longtail-solution
# 2. Install deps
pip install numpy huggingface_hub pyarrow
# 3. Start Ollama model (in another terminal)
ollama pull qwen3.5-vl:9b # or gemma4:e4b
ollama serve
# 4. Run generation
python scripts/generate_production.py \
--backend ollama \
--ollama-model qwen3.5-vl:9b \
--metadata data/test_metadata.json \
--output submissions/submission_ollama.jsonl \
--upload-repo ValeraZSD/kitscenes-submissions \
--upload-filename submission_ollama_v1.jsonl
# 5. Validate
python scripts/validate.py submissions/submission_ollama.jsonl
```
## Pipeline Architecture
```
Input: past_trajectory (21 pts @ 5Hz) + driving_instruction
Few-Shot CoT Prompt (3 training examples + query)
→ VLM (Ollama / HF Inference / Gemini)
→ XML output with 9 structured fields
Parse XML → Normalize to 5 accel × 5 steer commands
Kinematic Bicycle Model (Kong et al. 2015)
Phase 1: 0–3s (15 steps) Phase 2: 3–5s (10 steps)
→ 25 waypoints in ego-centric coordinates (+x=fwd, +y=left)
Output: scenario_id + future_trajectory (25×2) + reasoning (english, 9 fields)
```
## Steering Angle Calibration
Paper Table 6 values (±30° at low speed) are for instantaneous inputs. For sustained 3s phases in the bicycle model, we calibrated against expert trajectories:
| Command | Paper ≤60km/h | **Calibrated** | >60km/h |
|---|---|---|---|
| turning left | 30° | **6°** | 0.3° |
| turning slightly left | 10° | **1°** | 0.1° |
| steering straight | 0° | 0° | 0° |
| turning slightly right | -10° | **-1°** | -0.1° |
| turning right | -30° | **-6°** | -0.3° |
## Files
| File | Description |
|---|---|
| `scripts/generate_production.py` | Main pipeline — VLM + bicycle model + validation + upload |
| `scripts/validate.py` | Standalone submission validator (mirrors challenge code) |
| `data/test_metadata.json` | Pre-extracted test metadata (400 scenarios, 190KB, no images) |
| `configs/action_vocabulary.json` | Action → parameter mapping with calibrated values |
| `submissions/` | Generated submission JSONL files |
| `requirements.txt` | Python dependencies |
## Test Data Distribution
| Instruction | Count | Scenario Type | Count |
|---|---|---|---|
| drive straight on | 196 | intersection | 125 |
| turn right | 43 | overtake/lane change | 102 |
| use left lane | 34 | specifically selected | 68 |
| use right lane | 30 | construction zone | 36 |
| overtake truck | 25 | heavy rain | 27 |
| turn left | 20 | snow & wintry mix | 23 |
| u-turn | 8 | nighttime | 19 |
Speed: min=0, mean=52, max=130 km/h — 65% urban (≤60), 35% highway (>60)