blanchon's picture
|
download
raw
21.1 kB
---
license: cc-by-4.0
task_categories:
- video-classification
- reinforcement-learning
- other
language:
- en
tags:
- opencs2
- counter-strike-2
- torchcodec
- video
- audio
- parquet
pretty_name: "OpenCS2 - POV Renders"
configs:
- config_name: pov_rounds
data_files:
- split: train
path: index/pov_rounds.parquet
default: true
- config_name: matches
data_files:
- split: train
path: index/matches.parquet
- config_name: rounds
data_files:
- split: train
path: index/rounds.parquet
- config_name: kills
data_files:
- split: train
path: events/kills.parquet
- config_name: duels
data_files:
- split: train
path: events/duels.parquet
- config_name: clip_events
data_files:
- split: train
path: events/clip_events.parquet
- config_name: round_player
data_files:
- split: train
path: events/round_player.parquet
- config_name: enums
data_files:
- split: train
path: metadata/enums.parquet
---
# OpenCS2 - POV Renders
![OpenCS2](https://huggingface.co/datasets/blanchon/opencs2_dataset/resolve/main/static/header.webp)
> Browse with the [OpenCS2 Viewer](https://huggingface.co/spaces/blanchon/counter-strike-2-dataset-viewer) - every match, map and round, with all 10 player POVs synced on one timeline.
Tick-aligned Counter-Strike 2 POV training clips, rendered from
[`blanchon/cs2_dataset_demo`](https://huggingface.co/datasets/blanchon/cs2_dataset_demo). Each row
in the main table is one player's perspective for one round; ten POVs per round share the same tick
clock.
Per POV round:
- **Video** - 1280x720 @ 32 fps, near-lossless H.264, faststart, muxed with audio.
- **Audio** - per-player stereo, mixed from that player's position and orientation.
- **Inputs** - every tick: keys, mouse delta, view angles, fire/jump/use, weapon switches.
- **World state** - every tick: player position, velocity, view, health, armor, weapon, alive flag.
This is the simple loose-file layout: every POV round has its own directory containing `video.mp4`,
`video.preview.mp4`, and `ticks.parquet`. For large-scale training with fewer Hub files, use the
WebDataset packaging: [`blanchon/opencs2_dataset_wds`](https://huggingface.co/datasets/blanchon/opencs2_dataset_wds).
The same loose-file dataset is also mirrored as a Hugging Face Storage Bucket:
[`hf://buckets/blanchon/opencs2_dataset`](https://huggingface.co/buckets/blanchon/opencs2_dataset).
Current build: `165,270` POV rounds (`2974.2` POV video hours, `528.0` synced
round-timeline hours), `16,527` rounds, `794` match/maps, `111,715` kills.
## Usage
The default config is the POV-round index. Use the event/index configs to filter first, then stream
only the MP4 and tick sidecar you selected.
| Config | Row | Use |
| --- | --- | --- |
| `pov_rounds` (default) | one `(match_id, map_name, round, player_slot)` with path-only `video.mp4`, `video.preview.mp4`, and `ticks.parquet` | training, media lookup, download-size estimates |
| `matches` | one per `(match_id, map_name)` with team/event metadata | match/map filtering |
| `rounds` | one per `(match_id, map_name, round)` with tick boundaries and round outcome | round filtering |
| `kills` | one per kill | filtering by weapon, side, headshot, smoke, wallbang, clutch, 1v1 |
| `duels` | one per kill normalized as winner/loser | duel mining, winner POV selection |
| `clip_events` | generic clip-mining event rows | simple event filters for clip extraction |
| `round_player` | one per player per round with compact stats | per-player round filters |
| `enums` | enum lookup table | mapping compact `*_id` columns back to labels |
## Structure
```text
rounds/
match_id=<id>/map_name=<map>/round=<round>/player=<slot>/
video.mp4
video.preview.mp4
ticks.parquet
index/
matches.parquet
rounds.parquet
pov_rounds.parquet
events/
kills.parquet
duels.parquet
clip_events.parquet
round_player.parquet
metadata/
enums.parquet
```
Media columns are path-only Hugging Face structs:
```python
{"bytes": None, "path": "hf://datasets/blanchon/opencs2_dataset@main/rounds/.../video.mp4"}
```
This keeps the Dataset Viewer preview working without embedding MP4 bytes into parquet. Use
`media_bytes` and `preview_video_bytes` to estimate exact download size after filtering.
## Parquet Tables
String-like filter columns are dictionary encoded where useful, and most have a matching `*_id`
column for fast integer joins or enum-based modeling. Player identity is always `player_slot`
(`0..9`), not Steam ID or username.
| File | Rows | Purpose |
| --- | ---: | --- |
| `index/pov_rounds.parquet` | 165,270 | one row per player POV round; includes side/weapon summary, capture ticks, survival/death, media paths, byte sizes, and tick sidecar path |
| `index/matches.parquet` | 794 | one row per match/map with HLTV link, event, teams, score, winner, date, and rounds played |
| `index/rounds.parquet` | 16,527 | one row per round with tick boundaries, duration, winner/reason/bomb site, kill counts, opening kill summary, 1v1/clutch flags |
| `events/kills.parquet` | 111,715 | attacker/victim slots and sides, event time, weapon/class, hit details, alive counts before/after, trade/1v1/clutch/opening flags |
| `events/duels.parquet` | 111,715 | kill events normalized as winner/loser duels, useful for selecting the winner POV |
| `events/clip_events.parquet` | 111,715 | generic clip-mining table with event time, target/other player slots, weapon/class, and boolean flags |
| `events/round_player.parquet` | 168,294 | player round stats: side, kills, deaths, assists, headshots, KAST |
| `metadata/enums.parquet` | 115 | enum lookup table: `enum_name`, `enum_id`, `value` |
| `rounds/**/ticks.parquet` | per POV | tick/input/world-state rows: `tick`, `t`, button lists, view angles, weapon, health/armor, position, velocity |
Tick column `t` is the timestamp in the POV video. `event_seconds` is already on the POV video timeline. You can seek media directly with
`event_video_seconds = event_seconds`, or join event `tick` against the selected POV
`ticks.parquet` and use tick column `t`.
## Stream One Clip
```python
import re
from datasets import Video, load_dataset
from huggingface_hub import hf_hub_url
from torchcodec.decoders import AudioDecoder, VideoDecoder
def hf_path_to_url(path):
repo_id, revision, filename = re.match(r"hf://datasets/([^@]+)@([^/]+)/(.+)", path).groups()
return hf_hub_url(repo_id=repo_id, repo_type="dataset", revision=revision, filename=filename)
ds = load_dataset("blanchon/opencs2_dataset", "pov_rounds", split="train")
ds = ds.cast_column("video", Video(decode=False))
row = ds[0]
url = hf_path_to_url(row["video"]["path"])
video = VideoDecoder(url, seek_mode="approximate")
clip = video.get_frames_played_in_range(20.0, 30.0)
audio = AudioDecoder(url)
samples = audio.get_samples_played_in_range(20.0, 30.0)
```
Use `seek_mode="approximate"` for streaming so TorchCodec does not scan the entire MP4 during
initialization.
## Filter Before Media Access
```python
import duckdb
con = duckdb.connect()
con.sql("INSTALL httpfs; LOAD httpfs;")
awp_1v1 = con.sql("""
SELECT
d.match_id,
d.map_name,
d.round,
d.winner_player_slot AS player_slot,
d.event_seconds AS event_table_seconds,
d.event_seconds AS event_video_seconds,
p.video,
p.ticks_parquet_path,
p.media_bytes
FROM 'hf://datasets/blanchon/opencs2_dataset/events/duels.parquet' AS d
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' AS p
ON d.match_id = p.match_id
AND d.map_name = p.map_name
AND d.round = p.round
AND d.winner_player_slot = p.player_slot
WHERE d.weapon = 'awp'
AND d.is_1v1_before
""").df()
print(awp_1v1.head())
print("estimated MP4 bytes:", int(awp_1v1["media_bytes"].sum()))
```
To extract the 10 seconds around the duel, convert `video.path` with `hf_hub_url()` as above and
decode `[max(0, event_video_seconds - 5), event_video_seconds + 5]`.
## Verified Clip Recipes
> [!TIP]
> These recipes were verified by exporting 10 local examples each. For kill-derived examples, center clips on
> `event_video_seconds = event_seconds`.
Common Python setup:
This helper writes video-only MP4 clips through TorchCodec. It decodes the selected range as a
PyTorch `uint8` tensor, then encodes it back to H.264 MP4.
```python
import json
import re
from pathlib import Path
import duckdb
from huggingface_hub import hf_hub_url
from PIL import Image
from torchcodec.decoders import VideoDecoder
from torchcodec.encoders import VideoEncoder
OUT = Path("opencs2_examples")
FPS = 32.0
def hf_path_to_url(path):
repo_id, revision, filename = re.match(r"hf://datasets/([^@]+)@([^/]+)/(.+)", path).groups()
return hf_hub_url(repo_id=repo_id, repo_type="dataset", revision=revision, filename=filename)
def open_mp4(row):
return hf_path_to_url(row["video_path"])
def save_clip(row, name, before=5.0, after=5.0):
center = float(row["event_video_seconds"])
start = max(0.0, center - before)
stop = min(float(row["duration_s"]), center + after)
out = OUT / name / f"{row['event_id']}.mp4"
out.parent.mkdir(parents=True, exist_ok=True)
frames = VideoDecoder(
open_mp4(row),
seek_mode="approximate",
dimension_order="NCHW",
).get_frames_played_in_range(start_seconds=start, stop_seconds=stop, fps=FPS)
VideoEncoder(frames.data, frame_rate=FPS).to_file(
out,
codec="libx264",
pixel_format="yuv420p",
crf=20,
preset="veryfast",
extra_options={"x264-params": "keyint=32:min-keyint=1:scenecut=0:open-gop=0"},
)
return out
def save_png(frame_hwc, path):
Image.fromarray(frame_hwc.cpu().numpy()).save(path)
def save_frame_pair(row, name):
out = OUT / name / f"{row['media_id']}-{int(row['tick'])}"
out.mkdir(parents=True, exist_ok=True)
frames = VideoDecoder(
open_mp4(row),
seek_mode="approximate",
dimension_order="NHWC",
).get_frames_played_at(seconds=[float(row["t"]), float(row["next_t"])])
frame_t = frames.data[0]
frame_t1 = frames.data[1]
save_png(frame_t, out / "frame_t.png")
save_png(frame_t1, out / "frame_t_plus_1.png")
tick_t = {k: v for k, v in row.items() if not k.startswith("next_") and k != "video_path"}
tick_t_plus_1 = {**tick_t, "tick": int(row["next_tick"]), "t": float(row["next_t"])}
(out / "tick_t.json").write_text(json.dumps(tick_t, indent=2, default=str) + "\n")
(out / "tick_t_plus_1.json").write_text(json.dumps(tick_t_plus_1, indent=2, default=str) + "\n")
return out
con = duckdb.connect()
con.sql("INSTALL httpfs; LOAD httpfs;")
```
<details>
<summary><strong>AWP 1v1 Duel</strong></summary>
Winner POV for AWP kills where the duel table says the fight was a 1v1 before the kill.
```python
rows = con.sql("""
SELECT d.duel_id AS event_id, d.event_seconds AS event_video_seconds,
d.weapon, d.distance, d.headshot, p.duration_s,
struct_extract(p.video, 'path') AS video_path
FROM 'hf://datasets/blanchon/opencs2_dataset/events/duels.parquet' d
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' p
ON d.match_id=p.match_id AND d.map_name=p.map_name AND d.round=p.round
AND d.winner_player_slot=p.player_slot
WHERE d.weapon='awp' AND d.is_1v1_before
AND p.duration_s >= d.event_seconds + 5.0
ORDER BY d.event_seconds
LIMIT 10
""").df()
for row in rows.to_dict("records"):
save_clip(row, "awp_1v1_duel")
```
</details>
<details>
<summary><strong>Kill Through Smoke</strong></summary>
Attacker POV, with the kill centered five seconds into the exported clip.
```python
rows = con.sql("""
SELECT k.kill_id AS event_id, k.event_seconds AS event_video_seconds,
k.weapon, k.distance, k.headshot, p.duration_s,
struct_extract(p.video, 'path') AS video_path
FROM 'hf://datasets/blanchon/opencs2_dataset/events/kills.parquet' k
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' p
ON k.match_id=p.match_id AND k.map_name=p.map_name AND k.round=p.round
AND k.attacker_player_slot=p.player_slot
WHERE k.through_smoke
AND p.duration_s >= k.event_seconds + 5.0
LIMIT 10
""").df()
for row in rows.to_dict("records"):
save_clip(row, "kill_through_smoke")
```
</details>
<details>
<summary><strong>Noscope / Wallbang Highlight</strong></summary>
Attacker POV for kills flagged as noscope, wallbang, or penetration.
```python
rows = con.sql("""
SELECT k.kill_id AS event_id, k.event_seconds AS event_video_seconds,
k.weapon, k.noscope, k.wallbang, k.penetrated, p.duration_s,
struct_extract(p.video, 'path') AS video_path
FROM 'hf://datasets/blanchon/opencs2_dataset/events/kills.parquet' k
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' p
ON k.match_id=p.match_id AND k.map_name=p.map_name AND k.round=p.round
AND k.attacker_player_slot=p.player_slot
WHERE (k.noscope OR k.wallbang OR k.penetrated > 0)
AND p.duration_s >= k.event_seconds + 5.0
ORDER BY k.noscope DESC, k.wallbang DESC, k.penetrated DESC
LIMIT 10
""").df()
for row in rows.to_dict("records"):
save_clip(row, "noscope_wallbang")
```
</details>
<details>
<summary><strong>Knife Kill</strong></summary>
Attacker POV for actual knife kills, not just rounds where the player holds a knife.
```python
rows = con.sql("""
SELECT k.kill_id AS event_id, k.event_seconds AS event_video_seconds,
k.weapon, p.duration_s, struct_extract(p.video, 'path') AS video_path
FROM 'hf://datasets/blanchon/opencs2_dataset/events/kills.parquet' k
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' p
ON k.match_id=p.match_id AND k.map_name=p.map_name AND k.round=p.round
AND k.attacker_player_slot=p.player_slot
WHERE (lower(k.weapon_class)='knife' OR lower(k.weapon) LIKE '%knife%'
OR lower(k.weapon) LIKE '%bayonet%' OR lower(k.weapon) LIKE '%karambit%')
AND p.duration_s >= k.event_seconds + 5.0
LIMIT 10
""").df()
for row in rows.to_dict("records"):
save_clip(row, "knife_kill")
```
</details>
<details>
<summary><strong>Five Kills Under 10 Seconds</strong></summary>
Groups kills by player and round, then exports from the first kill through the end of the streak.
```python
rows = con.sql("""
WITH streaks AS (
SELECT match_id, map_name, round, attacker_player_slot AS player_slot,
COUNT(*) AS n_kills,
MIN(event_seconds) AS first_kill_video_seconds,
MAX(event_seconds) AS last_kill_video_seconds
FROM 'hf://datasets/blanchon/opencs2_dataset/events/kills.parquet'
GROUP BY match_id, map_name, round, attacker_player_slot
HAVING COUNT(*) >= 5 AND MAX(event_seconds) - MIN(event_seconds) < 10.0
)
SELECT concat('streak-', s.match_id, '-', s.map_name, '-r', s.round, '-p', s.player_slot) AS event_id,
s.first_kill_video_seconds AS event_video_seconds,
s.last_kill_video_seconds, s.n_kills, p.duration_s,
struct_extract(p.video, 'path') AS video_path
FROM streaks s
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' p
ON s.match_id=p.match_id AND s.map_name=p.map_name AND s.round=p.round
AND s.player_slot=p.player_slot
ORDER BY s.last_kill_video_seconds - s.first_kill_video_seconds
LIMIT 10
""").df()
for row in rows.to_dict("records"):
save_clip(row, "five_kills_under_10s", before=2.0, after=row["last_kill_video_seconds"] - row["event_video_seconds"] + 2.0)
```
</details>
<details>
<summary><strong>Very Long Distance Kill</strong></summary>
Attacker POV for the longest kills by event-table distance.
```python
rows = con.sql("""
SELECT k.kill_id AS event_id, k.event_seconds AS event_video_seconds,
k.weapon, k.distance, p.duration_s, struct_extract(p.video, 'path') AS video_path
FROM 'hf://datasets/blanchon/opencs2_dataset/events/kills.parquet' k
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' p
ON k.match_id=p.match_id AND k.map_name=p.map_name AND k.round=p.round
AND k.attacker_player_slot=p.player_slot
WHERE k.distance IS NOT NULL
AND p.duration_s >= k.event_seconds + 5.0
ORDER BY k.distance DESC
LIMIT 10
""").df()
for row in rows.to_dict("records"):
save_clip(row, "long_distance_kill")
```
</details>
<details>
<summary><strong>Position-Based Clip</strong></summary>
For global position scans, use the consolidated WDS tick index, then export the matching POV from
this repo.
```python
rows = con.sql("""
WITH ticks AS (
SELECT media_id, match_id, map_name, round, player_slot, tick, t, x, y, z
FROM 'hf://datasets/blanchon/opencs2_dataset_wds/ticks/match_id=2391545/map_name=de_anubis/ticks.parquet'
WHERE is_alive AND t > 5.0
),
anchors AS (
SELECT * FROM ticks
WHERE tick % 64 = 0
AND x BETWEEN -875 AND -625 AND y BETWEEN 125 AND 375
),
pairs AS (
SELECT DISTINCT ON (a.media_id) a.*, b.tick AS next_tick, b.t AS next_t
FROM anchors a JOIN ticks b ON a.media_id=b.media_id AND a.tick + 2 = b.tick
ORDER BY a.media_id, a.t
)
SELECT * FROM pairs LIMIT 10
""").df()
for row in rows.to_dict("records"):
pov = con.execute("""
SELECT duration_s, struct_extract(video, 'path') AS video_path
FROM 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet'
WHERE media_id = ?
""", [row["media_id"]]).df().iloc[0].to_dict()
save_clip({**row, **pov, "event_id": f"pos-{row['media_id']}-{row['tick']}", "event_video_seconds": row["t"]}, "position_based_clip")
```
</details>
<details>
<summary><strong>Boosting, Higher Player POV</strong></summary>
Uses tick positions to find a higher player above a nearby lower player for multiple consecutive
ticks. This is a heuristic, so visually inspect results.
Recipe id: `boosting_top_player`.
```sql
xy_distance < 36
z_delta BETWEEN 45 AND 90
abs(top.velocity_z) < 12
abs(lower.velocity_z) < 12
support_ticks >= 16
```
</details>
<details>
<summary><strong>Frame Pair Dataset Preview</strong></summary>
Selects `(frame_t, tick_t, frame_t+1, tick_t+1)` at a specific map position. At 32 fps, adjacent
video frames are usually two 64 Hz demo ticks apart.
Recipe id: `frame_pair_preview`.
```python
rows = con.sql("""
WITH ticks AS (
SELECT media_id, match_id, map_name, round, player_slot, tick, t, x, y, z
FROM 'hf://datasets/blanchon/opencs2_dataset_wds/ticks/match_id=2391545/map_name=de_anubis/ticks.parquet'
WHERE is_alive AND t > 5.0
),
anchors AS (
SELECT * FROM ticks
WHERE tick % 64 = 0
AND x BETWEEN -875 AND -625 AND y BETWEEN 125 AND 375
),
pairs AS (
SELECT DISTINCT ON (a.media_id) a.*, b.tick AS next_tick, b.t AS next_t
FROM anchors a JOIN ticks b ON a.media_id=b.media_id AND a.tick + 2 = b.tick
ORDER BY a.media_id, a.t
)
SELECT pairs.*, struct_extract(p.video, 'path') AS video_path
FROM pairs
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' p
ON pairs.media_id=p.media_id
LIMIT 10
""").df()
for row in rows.to_dict("records"):
save_frame_pair(row, "frame_pair_preview")
```
</details>
## Ticks And Frame Pairs
Each `ticks.parquet` row is one demo tick for one POV. The render is 32 fps while demo ticks are
64 Hz, so adjacent video frames are usually two tick rows apart. For `(frame_t, tick_t,
frame_t+1, tick_t+1)`, load the selected tick sidecar, choose timestamps, and decode both frames in
one TorchCodec call:
```python
import pyarrow.parquet as pq
from huggingface_hub import hf_hub_download
# For local/cache-first workflows, download the tick parquet path or use its hf:// URL with DuckDB.
ticks = pq.read_table("ticks.parquet").to_pandas()
t0 = 12.0
t1 = t0 + 1.0 / 32.0
tick0 = ticks.iloc[(ticks["t"] - t0).abs().argmin()]
tick1 = ticks.iloc[(ticks["t"] - t1).abs().argmin()]
frames = VideoDecoder(url, seek_mode="approximate").get_frames_played_at(
seconds=[float(tick0["t"]), float(tick1["t"])]
)
```
For high-throughput frame-pair training, prefer the WDS repo, group many timestamps by `media_id`,
decode them in batches, then shuffle emitted pairs.
## Downloading
```bash
# Metadata only.
hf download blanchon/opencs2_dataset --repo-type dataset \
--include "index/*.parquet" \
--include "events/*.parquet" \
--include "metadata/*.parquet"
# One full POV round.
hf download blanchon/opencs2_dataset --repo-type dataset \
--include "rounds/match_id=2391545/map_name=de_anubis/round=01/player=00/**"
```
## Creation
Built with a headless CS2 recorder from HLTV `.dem` files. The recorder replays each demo, captures
all 10 player POVs, validates tick/frame boundaries, streams frames through FFmpeg, muxes per-player
audio into the MP4, and writes typed parquet sidecars.
## Licensing
`.dem` source data is mirrored from HLTV; downstream use is bound by the original tournament terms.
Renders and metadata are released as **CC-BY-4.0**.
## Citation
```bibtex
@misc{blanchon2026opencs2,
author = {Julien Blanchon},
title = {OpenCS2 Dataset},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://github.com/julien-blanchon/opencs2-dataset}}
}
```

Xet Storage Details

Size:
21.1 kB
·
Xet hash:
3e99801cc51d06265680587facc22e30e44bfae7cf0875134c42113d9d52f485

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.