-
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
Paper • 2509.22576 • Published • 137 -
AgentBench: Evaluating LLMs as Agents
Paper • 2308.03688 • Published • 26 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 22 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64
JeonJinhyeok
jinn33
·
AI & ML interests
None yet
Organizations
None yet
article collection
-
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
Paper • 2509.22576 • Published • 137 -
AgentBench: Evaluating LLMs as Agents
Paper • 2308.03688 • Published • 26 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 22 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64
models 8
jinn33/crm-sft-adapter-v5
Updated
jinn33/crm-dpo-adapter-v7
Text Generation • Updated • 1
jinn33/crm-dpo-adapter-v2
Updated
jinn33/kanana-1.5-8b-rm
Updated • 1
jinn33/kanana-1.5-8b-rlhf
Updated
jinn33/crm-kto-adapter
Text Generation • Updated • 2
jinn33/crm-sft-adapter-v2
Updated
jinn33/crm-dpo-adapter
Updated • 1