File size: 4,112 Bytes
fe4fbf0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
license: mit
tags:
  - autonomous-driving
  - trajectory-prediction
  - kitscenes
  - kinematic-bicycle-model
---

# KITScenes LongTail Challenge — Solution

> **Challenge**: [KIT-MRT/KITScenes-LongTail-Challenge](https://huggingface.co/spaces/KIT-MRT/KITScenes-LongTail-Challenge)  
> **Dataset**: [KIT-MRT/KITScenes-LongTail](https://huggingface.co/datasets/KIT-MRT/KITScenes-LongTail)  
> **Paper**: [arXiv:2603.23607](https://arxiv.org/abs/2603.23607)

## Approach: Few-Shot CoT Kinematic Bicycle Model

Based on the best open-source method from the paper (MMS 4.24 with Gemma 3 12B), this solution uses:

1. **VLM Reasoning** — analyzes driving scenarios using past trajectory + instruction
2. **Kinematic Bicycle Model** (Kong et al. 2015) — converts action commands → 25 precise waypoints
3. **Few-Shot Chain-of-Thought** — 3 training examples guide the VLM's structured output

### Supported VLM Backends

| Backend | Command | Notes |
|---|---|---|
| **Ollama** (local) | `--backend ollama --ollama-model qwen3.5-vl:9b` | No rate limits, free, best for batch |
| **HF Inference** | `--backend hf` | Llama-4-Scout-17B via novita/groq |
| **Gemini** | `--backend gemini` | Needs `GOOGLE_API_KEY`, free tier rate-limited |
| **Fallback** | `--no-vlm` | Instruction-keyword heuristic, no API needed |

## Quick Start (Local with Ollama)

```bash
# 1. Clone
git clone https://huggingface.co/ValeraZSD/kitscenes-longtail-solution
cd kitscenes-longtail-solution

# 2. Install deps
pip install numpy huggingface_hub pyarrow

# 3. Start Ollama model (in another terminal)
ollama pull qwen3.5-vl:9b   # or gemma4:e4b
ollama serve

# 4. Run generation
python scripts/generate_production.py \
  --backend ollama \
  --ollama-model qwen3.5-vl:9b \
  --metadata data/test_metadata.json \
  --output submissions/submission_ollama.jsonl \
  --upload-repo ValeraZSD/kitscenes-submissions \
  --upload-filename submission_ollama_v1.jsonl

# 5. Validate
python scripts/validate.py submissions/submission_ollama.jsonl
```

## Pipeline Architecture

```
Input: past_trajectory (21 pts @ 5Hz) + driving_instruction

Few-Shot CoT Prompt (3 training examples + query)
  → VLM (Ollama / HF Inference / Gemini)
  → XML output with 9 structured fields

Parse XML → Normalize to 5 accel × 5 steer commands

Kinematic Bicycle Model (Kong et al. 2015)
  Phase 1: 0–3s (15 steps)   Phase 2: 3–5s (10 steps)
  → 25 waypoints in ego-centric coordinates (+x=fwd, +y=left)

Output: scenario_id + future_trajectory (25×2) + reasoning (english, 9 fields)
```

## Steering Angle Calibration

Paper Table 6 values (±30° at low speed) are for instantaneous inputs. For sustained 3s phases in the bicycle model, we calibrated against expert trajectories:

| Command | Paper ≤60km/h | **Calibrated** | >60km/h |
|---|---|---|---|
| turning left | 30° | **6°** | 0.3° |
| turning slightly left | 10° | **1°** | 0.1° |
| steering straight | 0° | 0° | 0° |
| turning slightly right | -10° | **-1°** | -0.1° |
| turning right | -30° | **-6°** | -0.3° |

## Files

| File | Description |
|---|---|
| `scripts/generate_production.py` | Main pipeline — VLM + bicycle model + validation + upload |
| `scripts/validate.py` | Standalone submission validator (mirrors challenge code) |
| `data/test_metadata.json` | Pre-extracted test metadata (400 scenarios, 190KB, no images) |
| `configs/action_vocabulary.json` | Action → parameter mapping with calibrated values |
| `submissions/` | Generated submission JSONL files |
| `requirements.txt` | Python dependencies |

## Test Data Distribution

| Instruction | Count | Scenario Type | Count |
|---|---|---|---|
| drive straight on | 196 | intersection | 125 |
| turn right | 43 | overtake/lane change | 102 |
| use left lane | 34 | specifically selected | 68 |
| use right lane | 30 | construction zone | 36 |
| overtake truck | 25 | heavy rain | 27 |
| turn left | 20 | snow & wintry mix | 23 |
| u-turn | 8 | nighttime | 19 |

Speed: min=0, mean=52, max=130 km/h — 65% urban (≤60), 35% highway (>60)