khazarai
/

datascience-RLHF

Text Generation

Model card Files Files and versions

datascience-RLHF / README.md

Rustamshry's picture

Update README.md

9df2d07 verified 5 months ago

|

history blame contribute delete

2.97 kB

	---
	base_model: unsloth/Qwen3-1.7B
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- base_model:adapter:unsloth/Qwen3-1.7B
	- lora
	- orpo
	- transformers
	- trl
	- unsloth
	license: mit
	datasets:
	- Anas989898/DPO-datascience
	language:
	- en
	---

	# Model Card for Model ID

	## Model Details

	This model is a fine-tuned version of Qwen3-1.7B using ORPO (Odds Ratio Preference Optimization), a reinforcement learning from human feedback (RLHF) method.

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Base Model: Qwen3-1.7B
	- Fine-tuning Method: ORPO (RLHF alignment)
	- Dataset: ~1,000 data science–related preference samples (chosen vs. rejected responses).
	- Objective: Improve model’s ability to generate higher-quality, relevant, and well-structured responses in data science
	- Language(s) (NLP): English
	- License: MIT


	## Uses

	### Direct Use

	- Assisting in data science education (explanations of ML concepts, statistical methods, etc.).
	- Supporting data analysis workflows with suggestions, reasoning, and structured outputs.
	- Acting as a teaching assistant for coding/data-related queries.
	- Providing helpful responses in preference-aligned conversations where correctness and clarity are prioritized.


	## Bias, Risks, and Limitations

	- Hallucinations: May still produce incorrect or fabricated facts, code, or references.
	- Dataset Size: Fine-tuned on only 1K preference pairs, which limits generalization.
	- Domain Focus: Optimized for data science, but may underperform on other domains.
	- Not a Substitute for Experts: Should not be used as the sole source for critical decisions in real-world projects.
	- Bias & Safety: As with all LLMs, may reflect biases present in training data.


	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from huggingface_hub import login
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel

	login(token="")

	tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-1.7B",)
	base_model = AutoModelForCausalLM.from_pretrained(
	"unsloth/Qwen3-1.7B",
	device_map={"": 0}, token=""
	)

	model = PeftModel.from_pretrained(base_model,"Rustamshry/datascience-RLHF")


	prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

	### Instruction:
	{}

	### Input:
	{}

	### Response:
	{}"""


	inputs = tokenizer(
	[
	prompt.format(
	"You are an AI assistant that helps people find information",
	"What is the k-Means Clustering algorithm and what is it's purpose?",
	"",
	)
	],
	return_tensors="pt",
	).to("cuda")


	from transformers import TextStreamer

	text_streamer = TextStreamer(tokenizer)
	_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=1800)
	```


	### Framework versions

	- PEFT 0.17.1