--- base_model: openai/gpt-oss-20b datasets: AIGym/free-gpt-oss library_name: transformers model_name: oss-multi-lingual tags: - generated_from_trainer - sft - trl licence: license --- ## Model Card: `AIGym/oss-adapter` ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63f2b7bcbe95ed4c9a9e7669/xEeUCVtMcl8svYB5-aYeO.png) ### Model Overview * **Base model**: Fine-tuned from `openai/gpt-oss-20b` using supervised fine-tuning (SFT) on the `AIGym/free-gpt-oss` dataset ([Hugging Face][1]). * **Motivation**: Created to participate in the OpenAI GPT-OSS-20B Red-Teaming Challenge on Kaggle, which tasked participants with probing and uncovering previously undetected harmful behaviors and vulnerabilities in the open-weight GPT-OSS-20B model ([Kaggle][2]). ### Intended Use & Scope * **Applications**: Designed primarily for red-teaming or safety evaluation tasks—leveraging its fine-tuning to explore and detect model vulnerabilities. It can also serve as a foundation in research or development of safer LLM applications. * **Limitations**: Not recommended for deployment in unmoderated settings or as a general-purpose chatbot. Outputs may include unsafe or adversarial behaviors due to its focus on red-teaming scenarios. ### Training Details * **Fine-tuning method**: Supervised fine-tuning (SFT) using the TRL library ([Hugging Face][1]). * **Tooling and versions**: * TRL: 0.21.0 * Transformers: 4.55.2 * PyTorch: 2.8.0.dev20250319+cu128 * Datasets: 4.0.0 * Tokenizers: 0.21.4 ([Hugging Face][1]). * **Dataset**: `AIGym/free-gpt-oss`, which presumably includes examples crafted to expose harmful behaviors in the base GPT-OSS-20B model (specific content should be described here if available). ### Evaluation & Behavior * **Challenge context**: The Kaggle Red-Teaming Challenge emphasized discovering hidden vulnerabilities in GPT-OSS-20B by adversarial prompting and probing ([Kaggle][2]). * **Performance**: (Include any metrics, success rates, or qualitative findings if you evaluated the model’s adversarial robustness compared to the base model.) ### Example Usage ```python from transformers import pipeline generator = pipeline( "text-generation", model="AIGym/oss-multi-lingual", # Or "AIGym/oss-adapter" depending on naming device="cuda" ) question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?" output = generator( [{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False )[0] print(output["generated_text"]) ``` This snippet demonstrates how to query the model in an interactive pipeline, useful for both red-teaming experiments and exploratory analysis ([Hugging Face][1]). ### Caveats & Ethical Considerations * **Potential risks**: The model is intentionally fine-tuned to surface vulnerabilities—it may generate harmful or unsafe content more readily than standard models. * **Recommended usage environment**: Restricted to controlled research and evaluation settings with proper moderation and oversight. Not intended for downstream production without robust safety measures. * **Transparency & reproducibility**: Encourage users to report findings responsibly and contribute to community understanding around safe LLM deployment. ### Summary Table | Section | Highlights | | -------------------- | ----------------------------------------------------------------------- | | **Overview** | Fine-tuned GPT-OSS-20B adapter for red-teaming, using AIGym dataset | | **Motivation** | Built for the Kaggle Red-Teaming Challenge targeting safety analysis | | **Tools & Versions** | TRL 0.21.0, Transformers 4.55.2, PyTorch dev build, Datasets 4.0.0 etc. | | **Usage Example** | Provided pipeline snippet for quick start | | **Caveats** | Generates potentially harmful outputs; meant only for controlled eval | | **Citation** | TRL GitHub repository |