--- datasets: - custom_jsonl_dataset language: - en library_name: transformers license: apache-2.0 model_name: MSC Software Engineering SLM v1 tags: - software-engineering - QLoRA - Mistral - SLM base_model: - mistralai/Mistral-7B-v0.1 --- # Model Card This model is a **QLoRA fine-tuned variant of Mistral-7B**, optimized for **software engineering, code generation, and technical Q&A** tasks. It was trained on a curated dataset of software design patterns, debugging tips, Python code snippets, and AI engineering discussions to improve reasoning and contextual understanding for software-related queries. ## Model Details - **Base Model:** `mistralai/Mistral-7B-v0.1` - **Fine-tuning Type:** QLoRA (4-bit quantization) - **Framework:** Hugging Face Transformers + PEFT + bitsandbytes - **Tokenizer:** Same as base model (`AutoTokenizer.from_pretrained(base_model, use_fast=True)`) - **Padding Token:** `tokenizer.pad_token = tokenizer.eos_token` - **Training Objective:** Causal language modeling --- ## Model Configuration | **Parameter** | **Value** | | ----------------------------- | ------------------------------------- | | **Model Type** | `mistral` | | **Architecture** | `MistralForCausalLM` | | **Vocab Size** | 32,768 | | **Max Position Embeddings** | 32,768 | | **Hidden Size** | 4,096 | | **Intermediate Size** | 14,336 | | **Number of Hidden Layers** | 32 | | **Number of Attention Heads** | 32 | | **Number of Key-Value Heads** | 8 | | **Hidden Activation** | `silu` | | **Initializer Range** | 0.02 | | **RMS Norm Epsilon** | 1e-5 | | **Dropout (Attention)** | 0.0 | | **Use Cache** | True | | **ROPE Theta** | 1,000,000.0 | | **Quantization Method** | `bitsandbytes` | | **Quantization Config** | 4-bit (nf4), `bfloat16` compute dtype | | **Compute Dtype** | `float16` | | **Load In 4bit** | ✅ Yes | | **Load In 8bit** | ❌ No | | **Tie Word Embeddings** | False | | **Is Encoder-Decoder** | False | | **BOS Token ID** | 1 | | **EOS Token ID** | 2 | | **Pad Token ID** | None | | **Generation Settings** | | | → Max Length | 20 | | → Min Length | 0 | | → Temperature | 1.0 | | → Top-k | 50 | | → Top-p | 1.0 | | → Num Beams | 1 | | → Repetition Penalty | 1.0 | | → Early Stopping | False | | **ID → Label Map** | {0: `LABEL_0`, 1: `LABEL_1`} | | **Label → ID Map** | {'LABEL_0': 0, 'LABEL_1': 1} | | **Training Framework** | Transformers v4.57.1 | | **Quant Library** | bitsandbytes | | **Local Path / Repo** | `./msci_software_engineering_slm_v1` | ## Quantization | **Parameter** | **Value** | | --------------------------- | -------------- | | `_load_in_4bit` | True | | `_load_in_8bit` | False | | `bnb_4bit_compute_dtype` | `bfloat16` | | `bnb_4bit_quant_storage` | `uint8` | | `bnb_4bit_quant_type` | `nf4` | | `bnb_4bit_use_double_quant` | False | | `load_in_4bit` | True | | `load_in_8bit` | False | | `quant_method` | `bitsandbytes` | ## Training Data The model was fine-tuned on a custom dataset (`data.jsonl`) consisting of: - Software engineering Q&A pairs - Code examples (Python, SQL, Docker, ML pipelines) - Developer chat-style dialogues - AI agent reasoning snippets --- ## Intended Uses - Software development assistance - Generating code snippets or debugging suggestions - Explaining AI/ML or MLOps concepts - General programming conversations --- ## Limitations - May produce hallucinated code or incorrect syntax. - Not tested on safety-critical or financial decision-making tasks. - Limited coverage outside software/AI domain. --- ## Example Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig import torch model_id = "techpro-saida/msci_software_engineering_slm_v1" # 4-bit config for efficient inference bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_quant_type="nf4", ) tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, quantization_config=bnb_config, device_map="auto", # automatically balances between GPU/CPU ) prompt = "Explain SOLID principles in OOP?" inputs = tokenizer(prompt, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7, top_p=0.9) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ##### if you on LOW RAM or CPU from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "techpro-saida/msci_software_engineering_slm_v1" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu") prompt = "Explain SOLID principles in OOP?" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=60, temperature=0.7) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Developer - **Developed by:** SAIDA D - **Model type:** SLM - **Language(s) (NLP):** ['en'] - **License:** apache-2.0 - **Finetuned from model : mistralai/Mistral-7B-v0.1`