Gaaaavin commited on
Commit
15b7700
·
verified ·
1 Parent(s): daa6f26

Initial upload: CityWalker 2000hr converted from Lightning .ckpt

Browse files
Files changed (3) hide show
  1. README.md +83 -0
  2. config.json +27 -0
  3. model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ tags:
5
+ - robotics
6
+ - navigation
7
+ - waypoint-prediction
8
+ - citywalker
9
+ - dinov2
10
+ pipeline_tag: robotics
11
+ ---
12
+
13
+ # CityWalker (2000hr)
14
+
15
+ HuggingFace port of the [CityWalker](https://github.com/ai4ce/CityWalker)
16
+ waypoint-prediction model, trained on 2000 hours of urban pedestrian
17
+ footage. This repo contains the converted weights of
18
+ `CityWalker_2000hr.ckpt` (originally a PyTorch Lightning checkpoint)
19
+ re-packaged as a `transformers.PreTrainedModel` so it can be loaded with
20
+ `AutoModel.from_pretrained`.
21
+
22
+ Model implementation lives in
23
+ [`ai4ce/wanderland-benchmark`](https://github.com/ai4ce/wanderland-benchmark)
24
+ under `src/wanderland_lab/models/citywalker/`.
25
+
26
+ ## Architecture
27
+
28
+ ```
29
+ images (B, T, 3, H, W) ─► DINOv2 (vit-b/14) ─► obs tokens (B, T, 768)
30
+ coords (B, T+1, 2) ─► PolarEmbedding + Linear ─► goal token (B, 1, 768)
31
+ ─► concat ─► (B, T+1, 768)
32
+ ─► TransformerEncoder (8 heads, 16 layers)
33
+ ─► MLP head ─► (waypoints, arrive_logits)
34
+ ```
35
+
36
+ - **T** = `context_size` = 5 recent RGB frames.
37
+ - **waypoints**: `(B, 5, 2)` cumulative XY deltas in body frame.
38
+ - **arrive_logits**: `(B, 1)` pre-sigmoid arrival score.
39
+
40
+ ## Usage
41
+
42
+ ```python
43
+ from transformers import AutoModel
44
+ from wanderland_lab.models.citywalker import CityWalkerModel # registers AutoModel
45
+
46
+ model = AutoModel.from_pretrained("ai4ce/citywalker")
47
+ model.load_obs_encoder() # fetches DINOv2 via torch.hub on first call
48
+ model.eval()
49
+ ```
50
+
51
+ The DINOv2 backbone is not bundled with the weights to avoid redistributing
52
+ Meta's pretrained checkpoint; `load_obs_encoder()` pulls it via `torch.hub`.
53
+
54
+ ## Inputs / Outputs
55
+
56
+ | Name | Shape | Notes |
57
+ |-------------------|---------------------------|-----------------------------------|
58
+ | `images` | `(B, 5, 3, H, W)` float32 | `[0, 1]` RGB; model handles resize + ImageNet normalize |
59
+ | `coords` | `(B, 6, 2)` float32 | Recent body-frame XY positions |
60
+ | `waypoints` out | `(B, 5, 2)` float32 | Cumulative XY deltas, body frame |
61
+ | `arrive_logits` | `(B, 1)` float32 | Pre-sigmoid |
62
+
63
+ ## Policy wrapper
64
+
65
+ For robot-control use (body-frame `(vx, vy, yaw_rate)` with per-episode
66
+ history + lookahead along a reference path), see `CityWalkerPolicy` in the
67
+ [`wanderland-lab`](https://github.com/ai4ce/wanderland-benchmark) repo.
68
+
69
+ ## Citation
70
+
71
+ ```
72
+ @inproceedings{liu2024citywalker,
73
+ title = {CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos},
74
+ author = {Liu, Xinhao and Chen, Jintong and Liu, Yichen and Feng, Chen},
75
+ booktitle = {CVPR},
76
+ year = {2025}
77
+ }
78
+ ```
79
+
80
+ ## License
81
+
82
+ Apache-2.0, matching the upstream
83
+ [ai4ce/CityWalker](https://github.com/ai4ce/CityWalker) repository.
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "CityWalkerModel"
4
+ ],
5
+ "context_size": 5,
6
+ "cord_include_input": true,
7
+ "cord_num_freqs": 6,
8
+ "crop": [
9
+ 400,
10
+ 400
11
+ ],
12
+ "decoder_ff_dim_factor": 4,
13
+ "decoder_num_heads": 8,
14
+ "decoder_num_layers": 16,
15
+ "do_resize": true,
16
+ "do_rgb_normalize": true,
17
+ "dtype": "float32",
18
+ "freeze_obs_encoder": true,
19
+ "len_traj_pred": 5,
20
+ "model_type": "citywalker",
21
+ "obs_encoder_type": "dinov2_vitb14",
22
+ "resize": [
23
+ 392,
24
+ 392
25
+ ],
26
+ "transformers_version": "5.8.0"
27
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb3c609a411eb901cdf4500a542c324e33bcf7a2b6ce328de6590cc55b8b8ca9
3
+ size 833735756