msci_software_engineering_slm_v1 / README.md

Update README.md

f6a0952 verified about 2 months ago

6.8 kB

	---
	datasets:
	- custom_jsonl_dataset
	language:
	- en
	library_name: transformers
	license: apache-2.0
	model_name: MSC Software Engineering SLM v1
	tags:
	- software-engineering
	- QLoRA
	- Mistral
	- SLM
	base_model:
	- mistralai/Mistral-7B-v0.1
	---

	# Model Card
	This model is a QLoRA fine-tuned variant of Mistral-7B, optimized for software engineering, code generation, and technical Q&A tasks.
	It was trained on a curated dataset of software design patterns, debugging tips, Python code snippets, and AI engineering discussions to improve reasoning and contextual understanding for software-related queries.



	## Model Details

	- Base Model: `mistralai/Mistral-7B-v0.1`
	- Fine-tuning Type: QLoRA (4-bit quantization)
	- Framework: Hugging Face Transformers + PEFT + bitsandbytes
	- Tokenizer: Same as base model (`AutoTokenizer.from_pretrained(base_model, use_fast=True)`)
	- Padding Token: `tokenizer.pad_token = tokenizer.eos_token`
	- Training Objective: Causal language modeling

	---
	## Model Configuration

	\| Parameter \| Value \|
	\| ----------------------------- \| ------------------------------------- \|
	\| Model Type \| `mistral` \|
	\| Architecture \| `MistralForCausalLM` \|
	\| Vocab Size \| 32,768 \|
	\| Max Position Embeddings \| 32,768 \|
	\| Hidden Size \| 4,096 \|
	\| Intermediate Size \| 14,336 \|
	\| Number of Hidden Layers \| 32 \|
	\| Number of Attention Heads \| 32 \|
	\| Number of Key-Value Heads \| 8 \|
	\| Hidden Activation \| `silu` \|
	\| Initializer Range \| 0.02 \|
	\| RMS Norm Epsilon \| 1e-5 \|
	\| Dropout (Attention) \| 0.0 \|
	\| Use Cache \| True \|
	\| ROPE Theta \| 1,000,000.0 \|
	\| Quantization Method \| `bitsandbytes` \|
	\| Quantization Config \| 4-bit (nf4), `bfloat16` compute dtype \|
	\| Compute Dtype \| `float16` \|
	\| Load In 4bit \| ✅ Yes \|
	\| Load In 8bit \| ❌ No \|
	\| Tie Word Embeddings \| False \|
	\| Is Encoder-Decoder \| False \|
	\| BOS Token ID \| 1 \|
	\| EOS Token ID \| 2 \|
	\| Pad Token ID \| None \|
	\| Generation Settings \| \|
	\| → Max Length \| 20 \|
	\| → Min Length \| 0 \|
	\| → Temperature \| 1.0 \|
	\| → Top-k \| 50 \|
	\| → Top-p \| 1.0 \|
	\| → Num Beams \| 1 \|
	\| → Repetition Penalty \| 1.0 \|
	\| → Early Stopping \| False \|
	\| ID → Label Map \| {0: `LABEL_0`, 1: `LABEL_1`} \|
	\| Label → ID Map \| {'LABEL_0': 0, 'LABEL_1': 1} \|
	\| Training Framework \| Transformers v4.57.1 \|
	\| Quant Library \| bitsandbytes \|
	\| Local Path / Repo \| `./msci_software_engineering_slm_v1` \|

	## Quantization
	\| Parameter \| Value \|
	\| --------------------------- \| -------------- \|
	\| `_load_in_4bit` \| True \|
	\| `_load_in_8bit` \| False \|
	\| `bnb_4bit_compute_dtype` \| `bfloat16` \|
	\| `bnb_4bit_quant_storage` \| `uint8` \|
	\| `bnb_4bit_quant_type` \| `nf4` \|
	\| `bnb_4bit_use_double_quant` \| False \|
	\| `load_in_4bit` \| True \|
	\| `load_in_8bit` \| False \|
	\| `quant_method` \| `bitsandbytes` \|



	## Training Data

	The model was fine-tuned on a custom dataset (`data.jsonl`) consisting of:
	- Software engineering Q&A pairs
	- Code examples (Python, SQL, Docker, ML pipelines)
	- Developer chat-style dialogues
	- AI agent reasoning snippets

	---

	## Intended Uses

	- Software development assistance
	- Generating code snippets or debugging suggestions
	- Explaining AI/ML or MLOps concepts
	- General programming conversations

	---

	## Limitations

	- May produce hallucinated code or incorrect syntax.
	- Not tested on safety-critical or financial decision-making tasks.
	- Limited coverage outside software/AI domain.

	---


	## Example Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	import torch

	model_id = "techpro-saida/msci_software_engineering_slm_v1"

	# 4-bit config for efficient inference
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_compute_dtype=torch.bfloat16,
	bnb_4bit_quant_type="nf4",
	)

	tokenizer = AutoTokenizer.from_pretrained(model_id)

	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	quantization_config=bnb_config,
	device_map="auto", # automatically balances between GPU/CPU
	)

	prompt = "Explain SOLID principles in OOP?"
	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

	outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7, top_p=0.9)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))


	##### if you on LOW RAM or CPU
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "techpro-saida/msci_software_engineering_slm_v1"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")

	prompt = "Explain SOLID principles in OOP?"
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=60, temperature=0.7)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))


	```

	## Developer

	- Developed by: SAIDA D
	- Model type: SLM
	- Language(s) (NLP): ['en']
	- License: apache-2.0
	- **Finetuned from model : mistralai/Mistral-7B-v0.1`