Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
wh-zhu 's Collections
PSFT
Realigner-TrRa
Weak-to-Strong
Realigner-InRa

PSFT

updated Mar 2

PSFT+RL models

Upvote
-

  • wh-zhu/Qwen2.5-7B-PSFT-RL-DAPO-90

    Text Generation • 8B • Updated 24 days ago • 226

  • wh-zhu/Qwen2.5-7B-Instruct-PSFT-1300

    8B • Updated Jul 26, 2025 • 4

  • wh-zhu/Qwen2.5-7B-SFT-RL-DAPO-90

    8B • Updated Aug 13, 2025 • 5

  • wh-zhu/Qwen2.5-7B-Instruct-SFT-700

    8B • Updated Jul 26, 2025 • 1

  • wh-zhu/llama3.1-8B-PSFT-dapo90

    8B • Updated Aug 13, 2025 • 4

  • wh-zhu/Llama3.1-8B-Instruct-PSFT-1500

    8B • Updated Jul 26, 2025 • 1

  • wh-zhu/Llama3.1-8B-Instruct-SFT-1200

    8B • Updated Jul 27, 2025 • 1

  • wh-zhu/llama3.1-8B-SFT-dapo100

    8B • Updated Aug 14, 2025 • 2

  • wh-zhu/LLama3.1-8B-Instruct-SFT200warmup-PSFT

    8B • Updated Aug 19, 2025 • 1

  • wh-zhu/Qwen2.5-7B-Instruct-SFT100warmup-PSFT

    8B • Updated Aug 19, 2025 • 1
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs