| language: en | |
| license: mit | |
| tags: | |
| - llm | |
| - sft | |
| - rlhf | |
| - qwen | |
| - llama | |
| # PaperAudit SFT/RL Model Collection | |
| This repo aggregates all SFT/RL fine-tuned models for the PaperAudit project. | |
| ## Model List | |
| | Model Name | Hugging Face Link | Description | | |
| |------------|-------------------|-------------| | |
| | Qwen3-8B-sft-rl | [mayiwen/PaperAudit_Qwen3_8B_sft_rl](https://huggingface.co/mayiwen/PaperAudit_Qwen3_8B_sft_rl) | Qwen3 8B model fine-tuned with SFT + RL for PaperAudit | | |
| | Qwen3-14B-sft-rl | [mayiwen/PaperAudit_Qwen3_14B_sft_rl](https://huggingface.co/mayiwen/PaperAudit_Qwen3_14B_sft_rl) | Qwen3 14B model fine-tuned with SFT + RL for PaperAudit | | |
| | Llama3.2-3B-sft-rl | [mayiwen/PaperAudit_Llama3.2_3B_sft_rl](https://huggingface.co/mayiwen/PaperAudit_Llama3.2_3B_sft_rl) | Llama3.2 3B model fine-tuned with SFT + RL for PaperAudit | | |
| ## Usage | |
| Refer to each model's repo for detailed usage instructions (training code, inference examples, etc.). |