---
datasets:
- custom_jsonl_dataset
language:
- en
library_name: transformers
license: apache-2.0
model_name: MSC Software Engineering SLM v1
tags:
- software-engineering
- QLoRA
- Mistral
- SLM
base_model:
- mistralai/Mistral-7B-v0.1
---

# Model Card
This model is a **QLoRA fine-tuned variant of Mistral-7B**, optimized for **software engineering, code generation, and technical Q&A** tasks.  
It was trained on a curated dataset of software design patterns, debugging tips, Python code snippets, and AI engineering discussions to improve reasoning and contextual understanding for software-related queries.


## Model Details

- **Base Model:** `mistralai/Mistral-7B-v0.1`
- **Fine-tuning Type:** QLoRA (4-bit quantization)
- **Framework:** Hugging Face Transformers + PEFT + bitsandbytes
- **Tokenizer:** Same as base model (`AutoTokenizer.from_pretrained(base_model, use_fast=True)`)
- **Padding Token:** `tokenizer.pad_token = tokenizer.eos_token`
- **Training Objective:** Causal language modeling

---
## Model Configuration

| **Parameter**                 | **Value**                             |
| ----------------------------- | ------------------------------------- |
| **Model Type**                | `mistral`                             |
| **Architecture**              | `MistralForCausalLM`                  |
| **Vocab Size**                | 32,768                                |
| **Max Position Embeddings**   | 32,768                                |
| **Hidden Size**               | 4,096                                 |
| **Intermediate Size**         | 14,336                                |
| **Number of Hidden Layers**   | 32                                    |
| **Number of Attention Heads** | 32                                    |
| **Number of Key-Value Heads** | 8                                     |
| **Hidden Activation**         | `silu`                                |
| **Initializer Range**         | 0.02                                  |
| **RMS Norm Epsilon**          | 1e-5                                  |
| **Dropout (Attention)**       | 0.0                                   |
| **Use Cache**                 | True                                  |
| **ROPE Theta**                | 1,000,000.0                           |
| **Quantization Method**       | `bitsandbytes`                        |
| **Quantization Config**       | 4-bit (nf4), `bfloat16` compute dtype |
| **Compute Dtype**             | `float16`                             |
| **Load In 4bit**              | ✅ Yes                                 |
| **Load In 8bit**              | ❌ No                                  |
| **Tie Word Embeddings**       | False                                 |
| **Is Encoder-Decoder**        | False                                 |
| **BOS Token ID**              | 1                                     |
| **EOS Token ID**              | 2                                     |
| **Pad Token ID**              | None                                  |
| **Generation Settings**       |                                       |
| → Max Length                  | 20                                    |
| → Min Length                  | 0                                     |
| → Temperature                 | 1.0                                   |
| → Top-k                       | 50                                    |
| → Top-p                       | 1.0                                   |
| → Num Beams                   | 1                                     |
| → Repetition Penalty          | 1.0                                   |
| → Early Stopping              | False                                 |
| **ID → Label Map**            | {0: `LABEL_0`, 1: `LABEL_1`}          |
| **Label → ID Map**            | {'LABEL_0': 0, 'LABEL_1': 1}          |
| **Training Framework**        | Transformers v4.57.1                  |
| **Quant Library**             | bitsandbytes                          |
| **Local Path / Repo**         | `./msci_software_engineering_slm_v1`  |

## Quantization 
| **Parameter**               | **Value**      |
| --------------------------- | -------------- |
| `_load_in_4bit`             | True           |
| `_load_in_8bit`             | False          |
| `bnb_4bit_compute_dtype`    | `bfloat16`     |
| `bnb_4bit_quant_storage`    | `uint8`        |
| `bnb_4bit_quant_type`       | `nf4`          |
| `bnb_4bit_use_double_quant` | False          |
| `load_in_4bit`              | True           |
| `load_in_8bit`              | False          |
| `quant_method`              | `bitsandbytes` |


## Training Data

The model was fine-tuned on a custom dataset (`data.jsonl`) consisting of:
- Software engineering Q&A pairs  
- Code examples (Python, SQL, Docker, ML pipelines)
- Developer chat-style dialogues  
- AI agent reasoning snippets  

---

## Intended Uses

- Software development assistance  
- Generating code snippets or debugging suggestions  
- Explaining AI/ML or MLOps concepts  
- General programming conversations  

---

## Limitations

- May produce hallucinated code or incorrect syntax.
- Not tested on safety-critical or financial decision-making tasks.
- Limited coverage outside software/AI domain.

---


## Example Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

model_id = "techpro-saida/msci_software_engineering_slm_v1"

# 4-bit config for efficient inference
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",  # automatically balances between GPU/CPU
)

prompt = "Explain SOLID principles in OOP?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


##### if you on LOW RAM or CPU
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "techpro-saida/msci_software_engineering_slm_v1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")

prompt = "Explain SOLID principles in OOP?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=60, temperature=0.7)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))


```

## Developer

- **Developed by:** SAIDA D
- **Model type:** SLM
- **Language(s) (NLP):** ['en']
- **License:** apache-2.0
- **Finetuned from model : mistralai/Mistral-7B-v0.1`