Hynek Kydlicek's picture

Hynek Kydlicek PRO

hynky

·

AI & ML interests

Data-processing

Recent Activity

updated a dataset about 2 hours ago

macrodata/whats_going_on_runs

updated a dataset 2 days ago

hynky/sam31-wassup-smoke-output

published a dataset 2 days ago

hynky/sam31-wassup-smoke-output

View all activity

Organizations

liked 2 Spaces 3 months ago

Unfolding Robotics: Open-Source Shirt Folding from Data to Deployment

Explore the open-source guide to robot shirt folding

The Synthetic Data Playbook: Generating Trillions of the Finest Tokens

Visualize synthetic‑data experiments as an interactive bookshelf

liked a Space 4 months ago

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

Who needs 1T parameters? Olympiad proofs with a 4B model

liked a dataset 6 months ago

HuggingFaceFW/finetranslations

Viewer • Updated Jan 9 • 3.33B • 18.8k • 295

liked a Space 6 months ago

FinePDFs: Liberating 3T of the finest tokens from PDFs

liked a Space 7 months ago

Evaluation Guidebook

Explore LLM benchmark scores over time

liked a dataset 10 months ago

HuggingFaceFW/finepdfs

Viewer • Updated Apr 3 • 476M • 80.6k • 882

liked a Space 10 months ago

Bringing paper to life: A modern template for scientific writing

Explore an interactive galaxy visualization of scientific article

liked a Space over 1 year ago

The Ultra-Scale Playbook

The ultimate guide to training LLM on large GPU Clusters

liked 2 datasets over 1 year ago

data-is-better-together/fineweb-c

Viewer • Updated Jul 8, 2025 • 88.7k • 3.06k • 60

HuggingFaceFW/fineweb-2

Viewer • Updated Oct 27, 2025 • 4.48B • 97k • 827

liked a Space over 1 year ago

Number Tokenization Blog

Explore how tokenization affects arithmetic in LLMs

liked 2 datasets over 1 year ago

CohereLabs/Global-MMLU

Viewer • Updated Aug 14, 2025 • 602k • 20.7k • 160

ClusterlabAi/InstAr-500k

Viewer • Updated Jul 30, 2024 • 481k • 169 • 15

liked a Space over 1 year ago

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

Evaluate multilingual models using FineTasks

liked a dataset over 1 year ago

LLM360/TxT360

Updated May 26, 2025 • 26.3k • 263

liked 2 Spaces over 1 year ago

Hub LFS Analysis

An analysis of LFS files on the Hub.

TxT360: Trillion Extracted Text

Explore the TxT360 LLM pre‑training dataset online

liked a dataset almost 2 years ago

Cleanlab/bad_data_gsm8k_svamp.csv

Viewer • Updated Apr 25, 2024 • 34 • 17 • 3

liked a Space almost 2 years ago

Datasets Metrics Explorer

Launch an interactive demo interface