Ken Tsui

kenhktsui

kennylam's profile picture

blanchon's profile picture

zhaoxu98's profile picture

https://kenhktsui.github.io/

kenhktsui
kenhktsui

AI & ML interests

ML engineer, researcher VLM, LLM benchmark Opinions are my own

Recent Activity

liked a model 4 days ago

zai-org/GLM-5.2

liked a model 29 days ago

sapientinc/HRM-Text-1B

liked a model about 2 months ago

talkie-lm/talkie-1930-13b-it

View all activity

Organizations

kenhktsui 's collections 7

Self Correction Bench

Benchmarking LLM capability of external and internal error correction

kenhktsui/scli5

Viewer • Updated Jul 6, 2025 • 286 • 393
kenhktsui/gsm8k_sc

Viewer • Updated Jul 6, 2025 • 1.31k • 154
kenhktsui/prm800k_sc

Viewer • Updated Jul 6, 2025 • 448 • 22
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Paper • 2507.02778 • Published Jul 3, 2025 • 9

LongTalk

A Very Long Chain-of-Thought Dataset for Reasoning Model Post-Training

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 33 • 13
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 44 • 1
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged

Text Generation • 8B • Updated Dec 30, 2024 • 7
kenhktsui/llama3.1-8b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 13 • 1

CoT

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 33 • 13
open-thoughts/OpenThoughts-114k

Viewer • Updated Aug 31, 2025 • 228k • 79.3k • 867
ServiceNow-AI/R1-Distill-SFT

Viewer • Updated Feb 8, 2025 • 1.85M • 2.76k • 322
Tiiny/QWQ-LONGCOT-500K

Viewer • Updated Dec 26, 2024 • 286k • 266 • 124

VLM Data

HuggingFaceM4/the_cauldron

Viewer • Updated May 6, 2024 • 1.88M • 246k • 546
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24, 2025 • 3.94M • 14.6k • 238
HuggingFaceM4/Docmatix

Viewer • Updated Aug 26, 2024 • 2.55M • 13.5k • 305
zwq2018/embodied_reasoner

Preview • Updated Apr 21, 2025 • 780 • 21

FastText Model for Pretraining Data Curation

kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26, 2025 • 665 • 28
kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 20 • 5
kenhktsui/code-natural-language-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 693 • 5
kenhktsui/math-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 22 • 2

textbook-quality-classifier

kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 20 • 5
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26, 2025 • 665 • 28
kenhktsui/llm-data-textbook-quality-classifier-v1

Text Classification • 0.3B • Updated May 25, 2024 • 8 • 10
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v1

Text Classification • Updated May 25, 2024 • 6 • 4

nano-phi

Small Language Model Trained with Textbook Quality Data - How Far Can It Go?

kenhktsui/nano-phi-115M-v0.1

Text Generation • 0.1B • Updated Apr 6, 2024 • 89 • 4
kenhktsui/nano-phi-115M-control-v0.1

Text Generation • 0.1B • Updated Feb 4, 2024 • 3 • 1
kenhktsui/nano-phi-192M-v0.1

Text Generation • 0.2B • Updated May 8, 2024 • 3 • 1

Self Correction Bench

Benchmarking LLM capability of external and internal error correction

kenhktsui/scli5

Viewer • Updated Jul 6, 2025 • 286 • 393
kenhktsui/gsm8k_sc

Viewer • Updated Jul 6, 2025 • 1.31k • 154
kenhktsui/prm800k_sc

Viewer • Updated Jul 6, 2025 • 448 • 22
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Paper • 2507.02778 • Published Jul 3, 2025 • 9

FastText Model for Pretraining Data Curation

kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26, 2025 • 665 • 28
kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 20 • 5
kenhktsui/code-natural-language-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 693 • 5
kenhktsui/math-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 22 • 2

LongTalk

A Very Long Chain-of-Thought Dataset for Reasoning Model Post-Training

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 33 • 13
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 44 • 1
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged

Text Generation • 8B • Updated Dec 30, 2024 • 7
kenhktsui/llama3.1-8b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 13 • 1

textbook-quality-classifier

kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 20 • 5
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26, 2025 • 665 • 28
kenhktsui/llm-data-textbook-quality-classifier-v1

Text Classification • 0.3B • Updated May 25, 2024 • 8 • 10
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v1

Text Classification • Updated May 25, 2024 • 6 • 4

CoT

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 33 • 13
open-thoughts/OpenThoughts-114k

Viewer • Updated Aug 31, 2025 • 228k • 79.3k • 867
ServiceNow-AI/R1-Distill-SFT

Viewer • Updated Feb 8, 2025 • 1.85M • 2.76k • 322
Tiiny/QWQ-LONGCOT-500K

Viewer • Updated Dec 26, 2024 • 286k • 266 • 124

nano-phi

Small Language Model Trained with Textbook Quality Data - How Far Can It Go?

kenhktsui/nano-phi-115M-v0.1

Text Generation • 0.1B • Updated Apr 6, 2024 • 89 • 4
kenhktsui/nano-phi-115M-control-v0.1

Text Generation • 0.1B • Updated Feb 4, 2024 • 3 • 1
kenhktsui/nano-phi-192M-v0.1

Text Generation • 0.2B • Updated May 8, 2024 • 3 • 1

VLM Data

HuggingFaceM4/the_cauldron

Viewer • Updated May 6, 2024 • 1.88M • 246k • 546
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24, 2025 • 3.94M • 14.6k • 238
HuggingFaceM4/Docmatix

Viewer • Updated Aug 26, 2024 • 2.55M • 13.5k • 305
zwq2018/embodied_reasoner

Preview • Updated Apr 21, 2025 • 780 • 21