dylu's picture

dylu

ludybupt

·

ludybupt

AI & ML interests

None yet

Recent Activity

liked a model about 2 months ago

tencent/Sequential-Hidden-Decoding-8B-n4

authored a paper 3 months ago

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

new activity 5 months ago

Qwen/Qwen3-Next-80B-A3B-Thinking:Request for SWE-bench-Verified Evaluation Metrics of Qwen3-Next-80B-A3B.

View all activity

Organizations

None yet

liked a model about 2 months ago

tencent/Sequential-Hidden-Decoding-8B-n4

10B • Updated Mar 10 • 18 • 10

authored a paper 3 months ago

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Paper • 2601.11868 • Published Jan 17 • 36

New activity in Qwen/Qwen3-Next-80B-A3B-Thinking 5 months ago

Request for SWE-bench-Verified Evaluation Metrics of Qwen3-Next-80B-A3B.

#18 opened 5 months ago by

New activity in SWE-bench/SWE-smith 8 months ago

How to get Previous version images(May 8 version in docker hub)

#8 opened 8 months ago by

liked a model 8 months ago

meituan-longcat/LongCat-Flash-Chat

Text Generation • 562B • Updated Sep 24, 2025 • 63k • 533

liked a dataset 9 months ago

tencent/WildSpeech-Bench

Viewer • Updated Sep 29, 2025 • 1.1k • 413 • 20

New activity in zai-org/SWE-Dev-train 9 months ago

Does the original instance of this Datasets available?

#4 opened 9 months ago by

liked a dataset 10 months ago

tech1984/swe_smith_back_translation

Viewer • Updated Jul 29, 2025 • 45k • 28 • 2

New activity in nebius/SWE-rebench 10 months ago

How to build docker image for each instance?

#4 opened 10 months ago by

New activity in SWE-bench/SWE-smith 10 months ago

I use a fine-tuned model(40% Resolved rate on SWE-verified) resolved 0 samples of the latest SWE-smith

#5 opened 10 months ago by

New activity in Skywork/Skywork-SWE-32B 10 months ago

Will the instance data(or traj) used to train Skywork be published?

#2 opened 10 months ago by

liked a dataset 10 months ago

SWE-bench-Live/SWE-bench-Live

Viewer • Updated Sep 18, 2025 • 3.69k • 6.21k • 7

liked 2 datasets 11 months ago

SWE-bench/SWE-smith-trajectories

Viewer • Updated Jul 19, 2025 • 76k • 3.24k • 60

SWE-bench/SWE-smith

Viewer • Updated Dec 14, 2025 • 59.1k • 20.1k • 51

liked 2 datasets about 1 year ago

KodCode/KodCode-V1-SFT-R1

Viewer • Updated Mar 17, 2025 • 483k • 2k • 38

facebook/natural_reasoning

Viewer • Updated Feb 21, 2025 • 1.15M • 1.94k • 565

liked a model over 1 year ago

tencent/Tencent-Hunyuan-Large

Text Generation • Updated Jan 19, 2025 • 466 • 616

New activity in allenai/WildBench over 1 year ago

WB Score for Info Seek/Creative/Code & Debug tc

#11 opened over 1 year ago by

liked a Space over 1 year ago

AI2 WildBench Leaderboard (V2)

Display and explore a leaderboard of language models

liked a dataset almost 2 years ago

silk-road/Wizard-LM-Chinese-instruct-evol

Viewer • Updated May 15, 2023 • 70k • 481 • 97