Buckets:

blanchon
/

opencs2_dataset

Files

xet

blanchon/opencs2_dataset / README.md

blanchon

3 days ago

preview code

download

raw

20.9 kB

metadata

license: cc-by-4.0
task_categories:
  - video-classification
  - reinforcement-learning
  - other
language:
  - en
tags:
  - opencs2
  - counter-strike-2
  - torchcodec
  - video
  - audio
  - parquet
pretty_name: OpenCS2 - POV Renders
configs:
  - config_name: pov_rounds
    data_files:
      - split: train
        path: index/pov_rounds.parquet
    default: true
  - config_name: matches
    data_files:
      - split: train
        path: index/matches.parquet
  - config_name: rounds
    data_files:
      - split: train
        path: index/rounds.parquet
  - config_name: kills
    data_files:
      - split: train
        path: events/kills.parquet
  - config_name: duels
    data_files:
      - split: train
        path: events/duels.parquet
  - config_name: clip_events
    data_files:
      - split: train
        path: events/clip_events.parquet
  - config_name: round_player
    data_files:
      - split: train
        path: events/round_player.parquet
  - config_name: enums
    data_files:
      - split: train
        path: metadata/enums.parquet

OpenCS2 - POV Renders

Browse with the OpenCS2 Viewer - every match, map and round, with all 10 player POVs synced on one timeline.

Tick-aligned Counter-Strike 2 POV training clips, rendered from blanchon/cs2_dataset_demo. Each row in the main table is one player's perspective for one round; ten POVs per round share the same tick clock.

Per POV round:

Video - 1280x720 @ 32 fps, near-lossless H.264, faststart, muxed with audio.
Audio - per-player stereo, mixed from that player's position and orientation.
Inputs - every tick: keys, mouse delta, view angles, fire/jump/use, weapon switches.
World state - every tick: player position, velocity, view, health, armor, weapon, alive flag.

This is the simple loose-file layout: every POV round has its own directory containing video.mp4, video.preview.mp4, and ticks.parquet. For large-scale training with fewer Hub files, use the WebDataset packaging: blanchon/opencs2_dataset_wds.

Current build: 169,960 POV rounds (3,135.1 POV video hours, 313.5 synced round-timeline hours), 16,984 rounds, 817 match/maps, 115,894 kills.

Usage

The default config is the POV-round index. Use the event/index configs to filter first, then stream only the MP4 and tick sidecar you selected.

Config	Row	Use
`pov_rounds` (default)	one `(match_id, map_name, round, player_slot)` with path-only `video.mp4`, `video.preview.mp4`, and `ticks.parquet`	training, media lookup, download-size estimates
`matches`	one per `(match_id, map_name)` with team/event metadata	match/map filtering
`rounds`	one per `(match_id, map_name, round)` with tick boundaries and round outcome	round filtering
`kills`	one per kill	filtering by weapon, side, headshot, smoke, wallbang, clutch, 1v1
`duels`	one per kill normalized as winner/loser	duel mining, winner POV selection
`clip_events`	generic clip-mining event rows	simple event filters for clip extraction
`round_player`	one per player per round with compact stats	per-player round filters
`enums`	enum lookup table	mapping compact `*_id` columns back to labels

Structure

rounds/
  match_id=<id>/map_name=<map>/round=<round>/player=<slot>/
    video.mp4
    video.preview.mp4
    ticks.parquet
index/
  matches.parquet
  rounds.parquet
  pov_rounds.parquet
events/
  kills.parquet
  duels.parquet
  clip_events.parquet
  round_player.parquet
metadata/
  enums.parquet

Media columns are path-only Hugging Face structs:

{"bytes": None, "path": "hf://datasets/blanchon/opencs2_dataset@main/rounds/.../video.mp4"}

This keeps the Dataset Viewer preview working without embedding MP4 bytes into parquet. Use media_bytes and preview_video_bytes to estimate exact download size after filtering.

Parquet Tables

String-like filter columns are dictionary encoded where useful, and most have a matching *_id column for fast integer joins or enum-based modeling. Player identity is always player_slot (0..9), not Steam ID or username.

File	Rows	Purpose
`index/pov_rounds.parquet`	169,960	one row per player POV round; includes side/weapon summary, capture ticks, survival/death, media paths, byte sizes, and tick sidecar path
`index/matches.parquet`	817	one row per match/map with HLTV link, event, teams, score, winner, date, and rounds played
`index/rounds.parquet`	16,984	one row per round with tick boundaries, duration, winner/reason/bomb site, kill counts, opening kill summary, 1v1/clutch flags
`events/kills.parquet`	115,894	attacker/victim slots and sides, event time, weapon/class, hit details, alive counts before/after, trade/1v1/clutch/opening flags
`events/duels.parquet`	115,894	kill events normalized as winner/loser duels, useful for selecting the winner POV
`events/clip_events.parquet`	115,894	generic clip-mining table with event time, target/other player slots, weapon/class, and boolean flags
`events/round_player.parquet`	172,935	player round stats: side, kills, deaths, assists, headshots, KAST
`metadata/enums.parquet`	115	enum lookup table: `enum_name`, `enum_id`, `value`
`rounds/**/ticks.parquet`	per POV	tick/input/world-state rows: `tick`, `t`, button lists, view angles, weapon, health/armor, position, velocity

Tick column t is the timestamp in the POV video. In the current event tables, use event_video_seconds = event_seconds * 2.0 for media seeking, or join event tick against the selected POV ticks.parquet and use tick column t.

Stream One Clip

import re
from datasets import Video, load_dataset
from huggingface_hub import hf_hub_url
from torchcodec.decoders import AudioDecoder, VideoDecoder

def hf_path_to_url(path):
    repo_id, revision, filename = re.match(r"hf://datasets/([^@]+)@([^/]+)/(.+)", path).groups()
    return hf_hub_url(repo_id=repo_id, repo_type="dataset", revision=revision, filename=filename)

ds = load_dataset("blanchon/opencs2_dataset", "pov_rounds", split="train")
ds = ds.cast_column("video", Video(decode=False))

row = ds[0]
url = hf_path_to_url(row["video"]["path"])

video = VideoDecoder(url, seek_mode="approximate")
clip = video.get_frames_played_in_range(20.0, 30.0)

audio = AudioDecoder(url)
samples = audio.get_samples_played_in_range(20.0, 30.0)

Use seek_mode="approximate" for streaming so TorchCodec does not scan the entire MP4 during initialization.

Filter Before Media Access

import duckdb

con = duckdb.connect()
con.sql("INSTALL httpfs; LOAD httpfs;")

awp_1v1 = con.sql("""
SELECT
  d.match_id,
  d.map_name,
  d.round,
  d.winner_player_slot AS player_slot,
  d.event_seconds AS event_table_seconds,
  d.event_seconds * 2.0 AS event_video_seconds,
  p.video,
  p.ticks_parquet_path,
  p.media_bytes
FROM 'hf://datasets/blanchon/opencs2_dataset/events/duels.parquet' AS d
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' AS p
  ON d.match_id = p.match_id
 AND d.map_name = p.map_name
 AND d.round = p.round
 AND d.winner_player_slot = p.player_slot
WHERE d.weapon = 'awp'
  AND d.is_1v1_before
""").df()

print(awp_1v1.head())
print("estimated MP4 bytes:", int(awp_1v1["media_bytes"].sum()))

To extract the 10 seconds around the duel, convert video.path with hf_hub_url() as above and decode [max(0, event_video_seconds - 5), event_video_seconds + 5].

Verified Clip Recipes

These recipes were verified by exporting 10 local examples each. For kill-derived examples, center clips on event_video_seconds = event_seconds * 2.0.

Common Python setup:

This helper writes video-only MP4 clips through TorchCodec. It decodes the selected range as a PyTorch uint8 tensor, then encodes it back to H.264 MP4.

import json
import re
from pathlib import Path

import duckdb
from huggingface_hub import hf_hub_url
from PIL import Image
from torchcodec.decoders import VideoDecoder
from torchcodec.encoders import VideoEncoder

OUT = Path("opencs2_examples")
FPS = 32.0

def hf_path_to_url(path):
    repo_id, revision, filename = re.match(r"hf://datasets/([^@]+)@([^/]+)/(.+)", path).groups()
    return hf_hub_url(repo_id=repo_id, repo_type="dataset", revision=revision, filename=filename)

def open_mp4(row):
    return hf_path_to_url(row["video_path"])

def save_clip(row, name, before=5.0, after=5.0):
    center = float(row["event_video_seconds"])
    start = max(0.0, center - before)
    stop = min(float(row["duration_s"]), center + after)
    out = OUT / name / f"{row['event_id']}.mp4"
    out.parent.mkdir(parents=True, exist_ok=True)
    frames = VideoDecoder(
        open_mp4(row),
        seek_mode="approximate",
        dimension_order="NCHW",
    ).get_frames_played_in_range(start_seconds=start, stop_seconds=stop, fps=FPS)
    VideoEncoder(frames.data, frame_rate=FPS).to_file(
        out,
        codec="libx264",
        pixel_format="yuv420p",
        crf=20,
        preset="veryfast",
        extra_options={"x264-params": "keyint=32:min-keyint=1:scenecut=0:open-gop=0"},
    )
    return out

def save_png(frame_hwc, path):
    Image.fromarray(frame_hwc.cpu().numpy()).save(path)

def save_frame_pair(row, name):
    out = OUT / name / f"{row['media_id']}-{int(row['tick'])}"
    out.mkdir(parents=True, exist_ok=True)
    frames = VideoDecoder(
        open_mp4(row),
        seek_mode="approximate",
        dimension_order="NHWC",
    ).get_frames_played_at(seconds=[float(row["t"]), float(row["next_t"])])
    frame_t = frames.data[0]
    frame_t1 = frames.data[1]

    save_png(frame_t, out / "frame_t.png")
    save_png(frame_t1, out / "frame_t_plus_1.png")

    tick_t = {k: v for k, v in row.items() if not k.startswith("next_") and k != "video_path"}
    tick_t_plus_1 = {**tick_t, "tick": int(row["next_tick"]), "t": float(row["next_t"])}
    (out / "tick_t.json").write_text(json.dumps(tick_t, indent=2, default=str) + "\n")
    (out / "tick_t_plus_1.json").write_text(json.dumps(tick_t_plus_1, indent=2, default=str) + "\n")
    return out

con = duckdb.connect()
con.sql("INSTALL httpfs; LOAD httpfs;")

AWP 1v1 Duel

Winner POV for AWP kills where the duel table says the fight was a 1v1 before the kill.

rows = con.sql("""
SELECT d.duel_id AS event_id, d.event_seconds * 2.0 AS event_video_seconds,
       d.weapon, d.distance, d.headshot, p.duration_s,
       struct_extract(p.video, 'path') AS video_path
FROM 'hf://datasets/blanchon/opencs2_dataset/events/duels.parquet' d
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' p
  ON d.match_id=p.match_id AND d.map_name=p.map_name AND d.round=p.round
 AND d.winner_player_slot=p.player_slot
WHERE d.weapon='awp' AND d.is_1v1_before
  AND p.duration_s >= d.event_seconds * 2.0 + 5.0
ORDER BY d.event_seconds
LIMIT 10
""").df()

for row in rows.to_dict("records"):
    save_clip(row, "awp_1v1_duel")

Kill Through Smoke

Attacker POV, with the kill centered five seconds into the exported clip.

rows = con.sql("""
SELECT k.kill_id AS event_id, k.event_seconds * 2.0 AS event_video_seconds,
       k.weapon, k.distance, k.headshot, p.duration_s,
       struct_extract(p.video, 'path') AS video_path
FROM 'hf://datasets/blanchon/opencs2_dataset/events/kills.parquet' k
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' p
  ON k.match_id=p.match_id AND k.map_name=p.map_name AND k.round=p.round
 AND k.attacker_player_slot=p.player_slot
WHERE k.through_smoke
  AND p.duration_s >= k.event_seconds * 2.0 + 5.0
LIMIT 10
""").df()

for row in rows.to_dict("records"):
    save_clip(row, "kill_through_smoke")

Noscope / Wallbang Highlight

Attacker POV for kills flagged as noscope, wallbang, or penetration.

rows = con.sql("""
SELECT k.kill_id AS event_id, k.event_seconds * 2.0 AS event_video_seconds,
       k.weapon, k.noscope, k.wallbang, k.penetrated, p.duration_s,
       struct_extract(p.video, 'path') AS video_path
FROM 'hf://datasets/blanchon/opencs2_dataset/events/kills.parquet' k
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' p
  ON k.match_id=p.match_id AND k.map_name=p.map_name AND k.round=p.round
 AND k.attacker_player_slot=p.player_slot
WHERE (k.noscope OR k.wallbang OR k.penetrated > 0)
  AND p.duration_s >= k.event_seconds * 2.0 + 5.0
ORDER BY k.noscope DESC, k.wallbang DESC, k.penetrated DESC
LIMIT 10
""").df()

for row in rows.to_dict("records"):
    save_clip(row, "noscope_wallbang")

Knife Kill

Attacker POV for actual knife kills, not just rounds where the player holds a knife.

rows = con.sql("""
SELECT k.kill_id AS event_id, k.event_seconds * 2.0 AS event_video_seconds,
       k.weapon, p.duration_s, struct_extract(p.video, 'path') AS video_path
FROM 'hf://datasets/blanchon/opencs2_dataset/events/kills.parquet' k
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' p
  ON k.match_id=p.match_id AND k.map_name=p.map_name AND k.round=p.round
 AND k.attacker_player_slot=p.player_slot
WHERE (lower(k.weapon_class)='knife' OR lower(k.weapon) LIKE '%knife%'
       OR lower(k.weapon) LIKE '%bayonet%' OR lower(k.weapon) LIKE '%karambit%')
  AND p.duration_s >= k.event_seconds * 2.0 + 5.0
LIMIT 10
""").df()

for row in rows.to_dict("records"):
    save_clip(row, "knife_kill")

Five Kills Under 10 Seconds

Groups kills by player and round, then exports from the first kill through the end of the streak.

rows = con.sql("""
WITH streaks AS (
  SELECT match_id, map_name, round, attacker_player_slot AS player_slot,
         COUNT(*) AS n_kills,
         MIN(event_seconds) * 2.0 AS first_kill_video_seconds,
         MAX(event_seconds) * 2.0 AS last_kill_video_seconds
  FROM 'hf://datasets/blanchon/opencs2_dataset/events/kills.parquet'
  GROUP BY match_id, map_name, round, attacker_player_slot
  HAVING COUNT(*) >= 5 AND MAX(event_seconds) - MIN(event_seconds) < 10.0
)
SELECT concat('streak-', s.match_id, '-', s.map_name, '-r', s.round, '-p', s.player_slot) AS event_id,
       s.first_kill_video_seconds AS event_video_seconds,
       s.last_kill_video_seconds, s.n_kills, p.duration_s,
       struct_extract(p.video, 'path') AS video_path
FROM streaks s
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' p
  ON s.match_id=p.match_id AND s.map_name=p.map_name AND s.round=p.round
 AND s.player_slot=p.player_slot
ORDER BY s.last_kill_video_seconds - s.first_kill_video_seconds
LIMIT 10
""").df()

for row in rows.to_dict("records"):
    save_clip(row, "five_kills_under_10s", before=2.0, after=row["last_kill_video_seconds"] - row["event_video_seconds"] + 2.0)

Very Long Distance Kill

Attacker POV for the longest kills by event-table distance.

rows = con.sql("""
SELECT k.kill_id AS event_id, k.event_seconds * 2.0 AS event_video_seconds,
       k.weapon, k.distance, p.duration_s, struct_extract(p.video, 'path') AS video_path
FROM 'hf://datasets/blanchon/opencs2_dataset/events/kills.parquet' k
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' p
  ON k.match_id=p.match_id AND k.map_name=p.map_name AND k.round=p.round
 AND k.attacker_player_slot=p.player_slot
WHERE k.distance IS NOT NULL
  AND p.duration_s >= k.event_seconds * 2.0 + 5.0
ORDER BY k.distance DESC
LIMIT 10
""").df()

for row in rows.to_dict("records"):
    save_clip(row, "long_distance_kill")

Position-Based Clip

For global position scans, use the consolidated WDS tick index, then export the matching POV from this repo.

rows = con.sql("""
WITH t AS (
  SELECT media_id, match_id, map_name, round, player_slot, tick, t, x, y, z
  FROM 'hf://datasets/blanchon/opencs2_dataset_wds/ticks/match_id=*/map_name=de_ancient/ticks.parquet'
  WHERE is_alive AND t > 5.0 AND tick % 64 = 0
    AND x BETWEEN 1000 AND 1250 AND y BETWEEN -1000 AND -750
),
pairs AS (
  SELECT DISTINCT ON (a.match_id) a.*, b.tick AS next_tick, b.t AS next_t
  FROM t a JOIN t b ON a.media_id=b.media_id AND a.tick + 2 = b.tick
  ORDER BY a.match_id, a.t
)
SELECT * FROM pairs LIMIT 10
""").df()

for row in rows.to_dict("records"):
    pov = con.execute("""
    SELECT duration_s, struct_extract(video, 'path') AS video_path
    FROM 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet'
    WHERE media_id = ?
    """, [row["media_id"]]).df().iloc[0].to_dict()
    save_clip({**row, **pov, "event_id": f"pos-{row['media_id']}-{row['tick']}", "event_video_seconds": row["t"]}, "position_based_clip")

Boosting, Higher Player POV

Uses tick positions to find a higher player above a nearby lower player for multiple consecutive ticks. This is a heuristic, so visually inspect results.

Recipe id: boosting_top_player.

xy_distance < 36
z_delta BETWEEN 45 AND 90
abs(top.velocity_z) < 12
abs(lower.velocity_z) < 12
support_ticks >= 16

Frame Pair Dataset Preview

Selects (frame_t, tick_t, frame_t+1, tick_t+1) at a specific map position. At 32 fps, adjacent video frames are usually two 64 Hz demo ticks apart.

Recipe id: frame_pair_preview.

rows = con.sql("""
WITH t AS (
  SELECT media_id, match_id, map_name, round, player_slot, tick, t, x, y, z
  FROM 'hf://datasets/blanchon/opencs2_dataset_wds/ticks/match_id=*/map_name=de_ancient/ticks.parquet'
  WHERE is_alive AND t > 5.0 AND tick % 64 = 0
    AND x BETWEEN 1000 AND 1250 AND y BETWEEN -1000 AND -750
),
pairs AS (
  SELECT DISTINCT ON (a.match_id) a.*, b.tick AS next_tick, b.t AS next_t
  FROM t a JOIN t b ON a.media_id=b.media_id AND a.tick + 2 = b.tick
  ORDER BY a.match_id, a.t
)
SELECT pairs.*, struct_extract(p.video, 'path') AS video_path
FROM pairs
JOIN 'hf://datasets/blanchon/opencs2_dataset/index/pov_rounds.parquet' p
  ON pairs.media_id=p.media_id
LIMIT 10
""").df()

for row in rows.to_dict("records"):
    save_frame_pair(row, "frame_pair_preview")

Ticks And Frame Pairs

Each ticks.parquet row is one demo tick for one POV. The render is 32 fps while demo ticks are 64 Hz, so adjacent video frames are usually two tick rows apart. For (frame_t, tick_t, frame_t+1, tick_t+1), load the selected tick sidecar, choose timestamps, and decode both frames in one TorchCodec call:

import pyarrow.parquet as pq
from huggingface_hub import hf_hub_download

# For local/cache-first workflows, download the tick parquet path or use its hf:// URL with DuckDB.
ticks = pq.read_table("ticks.parquet").to_pandas()

t0 = 12.0
t1 = t0 + 1.0 / 32.0
tick0 = ticks.iloc[(ticks["t"] - t0).abs().argmin()]
tick1 = ticks.iloc[(ticks["t"] - t1).abs().argmin()]

frames = VideoDecoder(url, seek_mode="approximate").get_frames_played_at(
    seconds=[float(tick0["t"]), float(tick1["t"])]
)

For high-throughput frame-pair training, prefer the WDS repo, group many timestamps by media_id, decode them in batches, then shuffle emitted pairs.

Downloading

# Metadata only.
hf download blanchon/opencs2_dataset --repo-type dataset \
  --include "index/*.parquet" \
  --include "events/*.parquet" \
  --include "metadata/*.parquet"

# One full POV round.
hf download blanchon/opencs2_dataset --repo-type dataset \
  --include "rounds/match_id=2391545/map_name=de_anubis/round=01/player=00/**"

Creation

Built with a headless CS2 recorder from HLTV .dem files. The recorder replays each demo, captures all 10 player POVs, validates tick/frame boundaries, streams frames through FFmpeg, muxes per-player audio into the MP4, and writes typed parquet sidecars.

Licensing

.dem source data is mirrored from HLTV; downstream use is bound by the original tournament terms. Renders and metadata are released as CC-BY-4.0.

Citation

@misc{blanchon2026opencs2,
  author       = {Julien Blanchon},
  title        = {OpenCS2 Dataset},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://github.com/julien-blanchon/opencs2-dataset}}
}

Xet Storage Details

Size:: 20.9 kB
Xet hash:: fb6c2ed2160af63edcd3caa6151145b7bd4260f93498853df8e136df02db7176

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.