File size: 6,803 Bytes

aa56624
 
 
 
 
 
 
 
 
 
 
 
 
5ba9c14
 
aa56624
 
5ba9c14
 
 
aa56624
 
 
 
 
5ba9c14
 
 
 
 
 
aa56624
5ba9c14
b5f85a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aa56624
5ba9c14
aa56624
5ba9c14
 
 
 
 
aa56624
5ba9c14
aa56624
5ba9c14
aa56624
5ba9c14
 
 
 
aa56624
5ba9c14
aa56624
5ba9c14
aa56624
5ba9c14
 
 
aa56624
5ba9c14
aa56624
 
5ba9c14
aa56624
5ba9c14
cf3dfe3
 
aa56624
cf3dfe3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aa56624
b97cb65
cf3dfe3
 
 
b97cb65
cf3dfe3
f6a0952
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1bec490
aa56624
1bec490
aa56624
5ba9c14

---
datasets:
- custom_jsonl_dataset
language:
- en
library_name: transformers
license: apache-2.0
model_name: MSC Software Engineering SLM v1
tags:
- software-engineering
- QLoRA
- Mistral
- SLM
base_model:
- mistralai/Mistral-7B-v0.1
---

# Model Card
This model is a **QLoRA fine-tuned variant of Mistral-7B**, optimized for **software engineering, code generation, and technical Q&A** tasks.  
It was trained on a curated dataset of software design patterns, debugging tips, Python code snippets, and AI engineering discussions to improve reasoning and contextual understanding for software-related queries.



## Model Details

- **Base Model:** `mistralai/Mistral-7B-v0.1`
- **Fine-tuning Type:** QLoRA (4-bit quantization)
- **Framework:** Hugging Face Transformers + PEFT + bitsandbytes
- **Tokenizer:** Same as base model (`AutoTokenizer.from_pretrained(base_model, use_fast=True)`)
- **Padding Token:** `tokenizer.pad_token = tokenizer.eos_token`
- **Training Objective:** Causal language modeling

---
## Model Configuration

| **Parameter**                 | **Value**                             |
| ----------------------------- | ------------------------------------- |
| **Model Type**                | `mistral`                             |
| **Architecture**              | `MistralForCausalLM`                  |
| **Vocab Size**                | 32,768                                |
| **Max Position Embeddings**   | 32,768                                |
| **Hidden Size**               | 4,096                                 |
| **Intermediate Size**         | 14,336                                |
| **Number of Hidden Layers**   | 32                                    |
| **Number of Attention Heads** | 32                                    |
| **Number of Key-Value Heads** | 8                                     |
| **Hidden Activation**         | `silu`                                |
| **Initializer Range**         | 0.02                                  |
| **RMS Norm Epsilon**          | 1e-5                                  |
| **Dropout (Attention)**       | 0.0                                   |
| **Use Cache**                 | True                                  |
| **ROPE Theta**                | 1,000,000.0                           |
| **Quantization Method**       | `bitsandbytes`                        |
| **Quantization Config**       | 4-bit (nf4), `bfloat16` compute dtype |
| **Compute Dtype**             | `float16`                             |
| **Load In 4bit**              | ✅ Yes                                 |
| **Load In 8bit**              | ❌ No                                  |
| **Tie Word Embeddings**       | False                                 |
| **Is Encoder-Decoder**        | False                                 |
| **BOS Token ID**              | 1                                     |
| **EOS Token ID**              | 2                                     |
| **Pad Token ID**              | None                                  |
| **Generation Settings**       |                                       |
| → Max Length                  | 20                                    |
| → Min Length                  | 0                                     |
| → Temperature                 | 1.0                                   |
| → Top-k                       | 50                                    |
| → Top-p                       | 1.0                                   |
| → Num Beams                   | 1                                     |
| → Repetition Penalty          | 1.0                                   |
| → Early Stopping              | False                                 |
| **ID → Label Map**            | {0: `LABEL_0`, 1: `LABEL_1`}          |
| **Label → ID Map**            | {'LABEL_0': 0, 'LABEL_1': 1}          |
| **Training Framework**        | Transformers v4.57.1                  |
| **Quant Library**             | bitsandbytes                          |
| **Local Path / Repo**         | `./msci_software_engineering_slm_v1`  |

## Quantization 
| **Parameter**               | **Value**      |
| --------------------------- | -------------- |
| `_load_in_4bit`             | True           |
| `_load_in_8bit`             | False          |
| `bnb_4bit_compute_dtype`    | `bfloat16`     |
| `bnb_4bit_quant_storage`    | `uint8`        |
| `bnb_4bit_quant_type`       | `nf4`          |
| `bnb_4bit_use_double_quant` | False          |
| `load_in_4bit`              | True           |
| `load_in_8bit`              | False          |
| `quant_method`              | `bitsandbytes` |



## Training Data

The model was fine-tuned on a custom dataset (`data.jsonl`) consisting of:
- Software engineering Q&A pairs  
- Code examples (Python, SQL, Docker, ML pipelines)
- Developer chat-style dialogues  
- AI agent reasoning snippets  

---

## Intended Uses

- Software development assistance  
- Generating code snippets or debugging suggestions  
- Explaining AI/ML or MLOps concepts  
- General programming conversations  

---

## Limitations

- May produce hallucinated code or incorrect syntax.
- Not tested on safety-critical or financial decision-making tasks.
- Limited coverage outside software/AI domain.

---


## Example Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

model_id = "techpro-saida/msci_software_engineering_slm_v1"

# 4-bit config for efficient inference
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",  # automatically balances between GPU/CPU
)

prompt = "Explain SOLID principles in OOP?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


##### if you on LOW RAM or CPU
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "techpro-saida/msci_software_engineering_slm_v1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")

prompt = "Explain SOLID principles in OOP?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=60, temperature=0.7)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))


```

## Developer

- **Developed by:** SAIDA D
- **Model type:** SLM
- **Language(s) (NLP):** ['en']
- **License:** apache-2.0
- **Finetuned from model : mistralai/Mistral-7B-v0.1`