Upload README.md

fe4fbf0 verified about 1 month ago

4.11 kB

	---
	license: mit
	tags:
	- autonomous-driving
	- trajectory-prediction
	- kitscenes
	- kinematic-bicycle-model
	---

	# KITScenes LongTail Challenge — Solution

	> Challenge: [KIT-MRT/KITScenes-LongTail-Challenge](https://huggingface.co/spaces/KIT-MRT/KITScenes-LongTail-Challenge)
	> Dataset: [KIT-MRT/KITScenes-LongTail](https://huggingface.co/datasets/KIT-MRT/KITScenes-LongTail)
	> Paper: [arXiv:2603.23607](https://arxiv.org/abs/2603.23607)

	## Approach: Few-Shot CoT Kinematic Bicycle Model

	Based on the best open-source method from the paper (MMS 4.24 with Gemma 3 12B), this solution uses:

	1. VLM Reasoning — analyzes driving scenarios using past trajectory + instruction
	2. Kinematic Bicycle Model (Kong et al. 2015) — converts action commands → 25 precise waypoints
	3. Few-Shot Chain-of-Thought — 3 training examples guide the VLM's structured output

	### Supported VLM Backends

	\| Backend \| Command \| Notes \|
	\|---\|---\|---\|
	\| Ollama (local) \| `--backend ollama --ollama-model qwen3.5-vl:9b` \| No rate limits, free, best for batch \|
	\| HF Inference \| `--backend hf` \| Llama-4-Scout-17B via novita/groq \|
	\| Gemini \| `--backend gemini` \| Needs `GOOGLE_API_KEY`, free tier rate-limited \|
	\| Fallback \| `--no-vlm` \| Instruction-keyword heuristic, no API needed \|

	## Quick Start (Local with Ollama)

	```bash
	# 1. Clone
	git clone https://huggingface.co/ValeraZSD/kitscenes-longtail-solution
	cd kitscenes-longtail-solution

	# 2. Install deps
	pip install numpy huggingface_hub pyarrow

	# 3. Start Ollama model (in another terminal)
	ollama pull qwen3.5-vl:9b # or gemma4:e4b
	ollama serve

	# 4. Run generation
	python scripts/generate_production.py \
	--backend ollama \
	--ollama-model qwen3.5-vl:9b \
	--metadata data/test_metadata.json \
	--output submissions/submission_ollama.jsonl \
	--upload-repo ValeraZSD/kitscenes-submissions \
	--upload-filename submission_ollama_v1.jsonl

	# 5. Validate
	python scripts/validate.py submissions/submission_ollama.jsonl
	```

	## Pipeline Architecture

	```
	Input: past_trajectory (21 pts @ 5Hz) + driving_instruction
	↓
	Few-Shot CoT Prompt (3 training examples + query)
	→ VLM (Ollama / HF Inference / Gemini)
	→ XML output with 9 structured fields
	↓
	Parse XML → Normalize to 5 accel × 5 steer commands
	↓
	Kinematic Bicycle Model (Kong et al. 2015)
	Phase 1: 0–3s (15 steps) Phase 2: 3–5s (10 steps)
	→ 25 waypoints in ego-centric coordinates (+x=fwd, +y=left)
	↓
	Output: scenario_id + future_trajectory (25×2) + reasoning (english, 9 fields)
	```

	## Steering Angle Calibration

	Paper Table 6 values (±30° at low speed) are for instantaneous inputs. For sustained 3s phases in the bicycle model, we calibrated against expert trajectories:

	\| Command \| Paper ≤60km/h \| Calibrated \| >60km/h \|
	\|---\|---\|---\|---\|
	\| turning left \| 30° \| 6° \| 0.3° \|
	\| turning slightly left \| 10° \| 1° \| 0.1° \|
	\| steering straight \| 0° \| 0° \| 0° \|
	\| turning slightly right \| -10° \| -1° \| -0.1° \|
	\| turning right \| -30° \| -6° \| -0.3° \|

	## Files

	\| File \| Description \|
	\|---\|---\|
	\| `scripts/generate_production.py` \| Main pipeline — VLM + bicycle model + validation + upload \|
	\| `scripts/validate.py` \| Standalone submission validator (mirrors challenge code) \|
	\| `data/test_metadata.json` \| Pre-extracted test metadata (400 scenarios, 190KB, no images) \|
	\| `configs/action_vocabulary.json` \| Action → parameter mapping with calibrated values \|
	\| `submissions/` \| Generated submission JSONL files \|
	\| `requirements.txt` \| Python dependencies \|

	## Test Data Distribution

	\| Instruction \| Count \| Scenario Type \| Count \|
	\|---\|---\|---\|---\|
	\| drive straight on \| 196 \| intersection \| 125 \|
	\| turn right \| 43 \| overtake/lane change \| 102 \|
	\| use left lane \| 34 \| specifically selected \| 68 \|
	\| use right lane \| 30 \| construction zone \| 36 \|
	\| overtake truck \| 25 \| heavy rain \| 27 \|
	\| turn left \| 20 \| snow & wintry mix \| 23 \|
	\| u-turn \| 8 \| nighttime \| 19 \|

	Speed: min=0, mean=52, max=130 km/h — 65% urban (≤60), 35% highway (>60)