25 3 9

Ran Le

leran1995

AI & ML interests

LLM Pretraining

Recent Activity

new activity 3 days ago

Nanbeige/Nanbeige4.1-3B:Any plans for a larger scale up? (e.g., 7B - 12B version)

new activity 4 months ago

Nanbeige/Nanbeige4.1-3B:Update README.md

new activity 4 months ago

Nanbeige/Nanbeige4.1-3B:Overthinking Problem

View all activity

Organizations

New activity in Nanbeige/Nanbeige4.1-3B 3 days ago

Any plans for a larger scale up? (e.g., 7B - 12B version)

#46 opened 3 days ago by

rpopreapovle

New activity in Nanbeige/Nanbeige4.1-3B 4 months ago

Update README.md

#38 opened 4 months ago by

kerasakit

Overthinking Problem

➕ 1

#27 opened 4 months ago by

JainilGosalia

Add syntax highlight in python markdown code snippets

#34 opened 4 months ago by

RahulSharma0

increase the max_new_tokens in the demo

#33 opened 4 months ago by

bitsnaps

dataset for sft

#30 opened 4 months ago by

Roman1111111

Chain-of-Thought or Chain-of-Mimicry? The Over-SFT problem in Nanbeige 4.1-3B aka "I_Should_X"

#26 opened 4 months ago by

srs6901

Typo in readme: Live-Code-Bench-Pro-Mediium -> Live-Code-Bench-Pro-Medium

#28 opened 4 months ago by

HenkPoley

Reportedly this 'Nanbeige/Nanbeige4.1-3B' model doesn't even know 'who' it is, or who makes it!?

#19 opened 4 months ago by

dakerholdings

Very Impressive!

❤️ 14

#7 opened 4 months ago by

cob05

anybody know the size of the context window for Nanbeige4.1-3B?

#20 opened 4 months ago by

test333333

you got us hooked now , cant wait for the release of the 4.2 version . could you please provide any ETA or approximations about when it MIGHT release ?

➕ 3

#17 opened 4 months ago by

Why-T

Training Data and inference scripts with tool calling , websearch and so on plus training scripts

❤️ 1

#16 opened 4 months ago by

snapo

Any Plans for an Instruct Model?

🤗🔥 6

#15 opened 4 months ago by

Ashacorporation

New activity in Nanbeige/Nanbeige4-3B-Base 4 months ago

Reasoning switch?

#3 opened 4 months ago by

Sinaya

New activity in Nanbeige/Nanbeige4.1-3B 4 months ago

License?

#5 opened 4 months ago by

ivanfioravanti

diagram and numbers dont match