File size: 6,803 Bytes
aa56624 5ba9c14 aa56624 5ba9c14 aa56624 5ba9c14 aa56624 5ba9c14 b5f85a3 aa56624 5ba9c14 aa56624 5ba9c14 aa56624 5ba9c14 aa56624 5ba9c14 aa56624 5ba9c14 aa56624 5ba9c14 aa56624 5ba9c14 aa56624 5ba9c14 aa56624 5ba9c14 aa56624 5ba9c14 aa56624 5ba9c14 cf3dfe3 aa56624 cf3dfe3 aa56624 b97cb65 cf3dfe3 b97cb65 cf3dfe3 f6a0952 1bec490 aa56624 1bec490 aa56624 5ba9c14 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
---
datasets:
- custom_jsonl_dataset
language:
- en
library_name: transformers
license: apache-2.0
model_name: MSC Software Engineering SLM v1
tags:
- software-engineering
- QLoRA
- Mistral
- SLM
base_model:
- mistralai/Mistral-7B-v0.1
---
# Model Card
This model is a **QLoRA fine-tuned variant of Mistral-7B**, optimized for **software engineering, code generation, and technical Q&A** tasks.
It was trained on a curated dataset of software design patterns, debugging tips, Python code snippets, and AI engineering discussions to improve reasoning and contextual understanding for software-related queries.
## Model Details
- **Base Model:** `mistralai/Mistral-7B-v0.1`
- **Fine-tuning Type:** QLoRA (4-bit quantization)
- **Framework:** Hugging Face Transformers + PEFT + bitsandbytes
- **Tokenizer:** Same as base model (`AutoTokenizer.from_pretrained(base_model, use_fast=True)`)
- **Padding Token:** `tokenizer.pad_token = tokenizer.eos_token`
- **Training Objective:** Causal language modeling
---
## Model Configuration
| **Parameter** | **Value** |
| ----------------------------- | ------------------------------------- |
| **Model Type** | `mistral` |
| **Architecture** | `MistralForCausalLM` |
| **Vocab Size** | 32,768 |
| **Max Position Embeddings** | 32,768 |
| **Hidden Size** | 4,096 |
| **Intermediate Size** | 14,336 |
| **Number of Hidden Layers** | 32 |
| **Number of Attention Heads** | 32 |
| **Number of Key-Value Heads** | 8 |
| **Hidden Activation** | `silu` |
| **Initializer Range** | 0.02 |
| **RMS Norm Epsilon** | 1e-5 |
| **Dropout (Attention)** | 0.0 |
| **Use Cache** | True |
| **ROPE Theta** | 1,000,000.0 |
| **Quantization Method** | `bitsandbytes` |
| **Quantization Config** | 4-bit (nf4), `bfloat16` compute dtype |
| **Compute Dtype** | `float16` |
| **Load In 4bit** | β
Yes |
| **Load In 8bit** | β No |
| **Tie Word Embeddings** | False |
| **Is Encoder-Decoder** | False |
| **BOS Token ID** | 1 |
| **EOS Token ID** | 2 |
| **Pad Token ID** | None |
| **Generation Settings** | |
| β Max Length | 20 |
| β Min Length | 0 |
| β Temperature | 1.0 |
| β Top-k | 50 |
| β Top-p | 1.0 |
| β Num Beams | 1 |
| β Repetition Penalty | 1.0 |
| β Early Stopping | False |
| **ID β Label Map** | {0: `LABEL_0`, 1: `LABEL_1`} |
| **Label β ID Map** | {'LABEL_0': 0, 'LABEL_1': 1} |
| **Training Framework** | Transformers v4.57.1 |
| **Quant Library** | bitsandbytes |
| **Local Path / Repo** | `./msci_software_engineering_slm_v1` |
## Quantization
| **Parameter** | **Value** |
| --------------------------- | -------------- |
| `_load_in_4bit` | True |
| `_load_in_8bit` | False |
| `bnb_4bit_compute_dtype` | `bfloat16` |
| `bnb_4bit_quant_storage` | `uint8` |
| `bnb_4bit_quant_type` | `nf4` |
| `bnb_4bit_use_double_quant` | False |
| `load_in_4bit` | True |
| `load_in_8bit` | False |
| `quant_method` | `bitsandbytes` |
## Training Data
The model was fine-tuned on a custom dataset (`data.jsonl`) consisting of:
- Software engineering Q&A pairs
- Code examples (Python, SQL, Docker, ML pipelines)
- Developer chat-style dialogues
- AI agent reasoning snippets
---
## Intended Uses
- Software development assistance
- Generating code snippets or debugging suggestions
- Explaining AI/ML or MLOps concepts
- General programming conversations
---
## Limitations
- May produce hallucinated code or incorrect syntax.
- Not tested on safety-critical or financial decision-making tasks.
- Limited coverage outside software/AI domain.
---
## Example Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
model_id = "techpro-saida/msci_software_engineering_slm_v1"
# 4-bit config for efficient inference
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto", # automatically balances between GPU/CPU
)
prompt = "Explain SOLID principles in OOP?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
##### if you on LOW RAM or CPU
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "techpro-saida/msci_software_engineering_slm_v1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")
prompt = "Explain SOLID principles in OOP?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=60, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Developer
- **Developed by:** SAIDA D
- **Model type:** SLM
- **Language(s) (NLP):** ['en']
- **License:** apache-2.0
- **Finetuned from model : mistralai/Mistral-7B-v0.1`
|