|
|
--- |
|
|
base_model: unsloth/Qwen3-1.7B |
|
|
library_name: peft |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- base_model:adapter:unsloth/Qwen3-1.7B |
|
|
- lora |
|
|
- orpo |
|
|
- transformers |
|
|
- trl |
|
|
- unsloth |
|
|
license: mit |
|
|
datasets: |
|
|
- Anas989898/DPO-datascience |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
|
|
|
## Model Details |
|
|
|
|
|
This model is a fine-tuned version of Qwen3-1.7B using ORPO (Odds Ratio Preference Optimization), a reinforcement learning from human feedback (RLHF) method. |
|
|
|
|
|
### Model Description |
|
|
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
- **Base Model:** Qwen3-1.7B |
|
|
- **Fine-tuning Method:** ORPO (RLHF alignment) |
|
|
- **Dataset:** ~1,000 data science–related preference samples (chosen vs. rejected responses). |
|
|
- **Objective:** Improve model’s ability to generate higher-quality, relevant, and well-structured responses in data science |
|
|
- **Language(s) (NLP):** English |
|
|
- **License:** MIT |
|
|
|
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
- Assisting in data science education (explanations of ML concepts, statistical methods, etc.). |
|
|
- Supporting data analysis workflows with suggestions, reasoning, and structured outputs. |
|
|
- Acting as a teaching assistant for coding/data-related queries. |
|
|
- Providing helpful responses in preference-aligned conversations where correctness and clarity are prioritized. |
|
|
|
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
- Hallucinations: May still produce incorrect or fabricated facts, code, or references. |
|
|
- Dataset Size: Fine-tuned on only 1K preference pairs, which limits generalization. |
|
|
- Domain Focus: Optimized for data science, but may underperform on other domains. |
|
|
- Not a Substitute for Experts: Should not be used as the sole source for critical decisions in real-world projects. |
|
|
- Bias & Safety: As with all LLMs, may reflect biases present in training data. |
|
|
|
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
Use the code below to get started with the model. |
|
|
|
|
|
```python |
|
|
from huggingface_hub import login |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
from peft import PeftModel |
|
|
|
|
|
login(token="") |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-1.7B",) |
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
"unsloth/Qwen3-1.7B", |
|
|
device_map={"": 0}, token="" |
|
|
) |
|
|
|
|
|
model = PeftModel.from_pretrained(base_model,"Rustamshry/datascience-RLHF") |
|
|
|
|
|
|
|
|
prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. |
|
|
|
|
|
### Instruction: |
|
|
{} |
|
|
|
|
|
### Input: |
|
|
{} |
|
|
|
|
|
### Response: |
|
|
{}""" |
|
|
|
|
|
|
|
|
inputs = tokenizer( |
|
|
[ |
|
|
prompt.format( |
|
|
"You are an AI assistant that helps people find information", |
|
|
"What is the k-Means Clustering algorithm and what is it's purpose?", |
|
|
"", |
|
|
) |
|
|
], |
|
|
return_tensors="pt", |
|
|
).to("cuda") |
|
|
|
|
|
|
|
|
from transformers import TextStreamer |
|
|
|
|
|
text_streamer = TextStreamer(tokenizer) |
|
|
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=1800) |
|
|
``` |
|
|
|
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- PEFT 0.17.1 |