Dazhi Jiang

thuzhizhi

·

jiangzizi

AI & ML interests

None yet

Recent Activity

liked a dataset 19 days ago

meituan-longcat/VitaBench-2.0

liked a dataset 28 days ago

thuzhizhi/DAPO-MATH-17k-oss-reasoning

updated a dataset 28 days ago

thuzhizhi/DAPO-MATH-17k-oss-reasoning

View all activity

Organizations

None yet

liked a dataset 19 days ago

meituan-longcat/VitaBench-2.0

Viewer • Updated 26 days ago • 56 • 298 • 3

liked a dataset 28 days ago

thuzhizhi/DAPO-MATH-17k-oss-reasoning

Viewer • Updated 28 days ago • 53.8k • 153 • 2

updated a dataset 28 days ago

thuzhizhi/DAPO-MATH-17k-oss-reasoning

Viewer • Updated 28 days ago • 53.8k • 153 • 2

published a dataset 28 days ago

thuzhizhi/DAPO-MATH-17k-oss-reasoning

Viewer • Updated 28 days ago • 53.8k • 153 • 2

reacted to codelion's post with 🚀 about 1 month ago

Post

3229

Inspired by the Nemotron Diffusion recipe, check out dhara-250m: a 250M experimental language model that supports three decoding modes from one set of weights: autoregressive, block-diffusion, and self-speculation.

It is small, easy to try, and meant for exploring diffusion-style decoding and latency tradeoffs in compact LMs.

Model: codelion/dhara-250m

Try the chat demo here: codelion/dhara-chat

3 replies

·

liked a model about 2 months ago

deepseek-ai/DeepSeek-V4-Pro

Text Generation • 862B • Updated 8 days ago • 1.14M • • 5.11k

updated a Space about 2 months ago

Trackio Dashboard

Display an interactive I/O tracking dashboard

published a Space about 2 months ago

Trackio Dashboard

Display an interactive I/O tracking dashboard

liked a dataset 3 months ago

TAAC2026/data_sample_1000

Viewer • Updated Apr 10 • 1k • 960 • 93

liked a model 4 months ago

Nanbeige/Nanbeige4.1-3B

Text Generation • 4B • Updated Mar 25 • 5.52k • • 1.14k

liked a Space 4 months ago

OpenHands Index

A Holistic Benchmark for Software Engineering

liked 2 models 5 months ago

zai-org/GLM-5

Text Generation • 754B • Updated Apr 5 • 65.5k • • 2.11k

moonshotai/Kimi-K2.5

Image-Text-to-Text • 1.1T • Updated Apr 30 • 1.58M • • 2.83k

upvoted an article 6 months ago

Article

The Optimal Architecture for Small Language Models

codelion

•

Dec 26, 2025

• 121

liked a dataset 6 months ago

BytedTsinghua-SIA/DAPO-Math-17k

Viewer • Updated Apr 18, 2025 • 1.79M • 11.8k • 180

liked a model 6 months ago

zai-org/GLM-4.7

Text Generation • 358B • Updated Jan 29 • 62.1k • • 2.04k

liked a dataset 7 months ago

qi6776/Recflow

Updated Jul 11, 2025 • 199 • 1

upvoted a paper 8 months ago

Data-Efficient RLVR via Off-Policy Influence Guidance

Paper • 2510.26491 • Published Oct 30, 2025 • 11

liked a Space 8 months ago

The Smol Training Playbook

The secrets to building world-class LLMs

liked a model 8 months ago

inclusionAI/LLaDA-MoE-7B-A1B-Instruct

7B • Updated Oct 28, 2025 • 1.15k • 72