Juntao Dai's picture

Juntao Dai

calico-1226

·

calico-1226

AI & ML interests

RLHF

Organizations

upvoted a collection 4 months ago

AgentDoG

A Diagnostic Guardrail Framework for AI Agent Safety and Security • 12 items • Updated about 15 hours ago • 109

upvoted a collection over 1 year ago

SafeSora

Towards Safety Alignment of Text2Video Generation • 4 items • Updated Aug 15, 2024 • 2

upvoted a paper over 2 years ago

Safe RLHF: Safe Reinforcement Learning from Human Feedback

Paper • 2310.12773 • Published Oct 19, 2023 • 28

upvoted a paper almost 3 years ago

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset

Paper • 2307.04657 • Published Jul 10, 2023 • 6