techpro-saida
/

msci_software_engineering_slm_v1

@@ -31,6 +31,64 @@ It was trained on a curated dataset of software design patterns, debugging tips,
 - **Training Objective:** Causal language modeling
 ---
 ## Training Data

 - **Training Objective:** Causal language modeling
 ---
+## Model Configuration
+| **Parameter**                 | **Value**                             |
+| ----------------------------- | ------------------------------------- |
+| **Model Type**                | `mistral`                             |
+| **Architecture**              | `MistralForCausalLM`                  |
+| **Vocab Size**                | 32,768                                |
+| **Max Position Embeddings**   | 32,768                                |
+| **Hidden Size**               | 4,096                                 |
+| **Intermediate Size**         | 14,336                                |
+| **Number of Hidden Layers**   | 32                                    |
+| **Number of Attention Heads** | 32                                    |
+| **Number of Key-Value Heads** | 8                                     |
+| **Hidden Activation**         | `silu`                                |
+| **Initializer Range**         | 0.02                                  |
+| **RMS Norm Epsilon**          | 1e-5                                  |
+| **Dropout (Attention)**       | 0.0                                   |
+| **Use Cache**                 | True                                  |
+| **ROPE Theta**                | 1,000,000.0                           |
+| **Quantization Method**       | `bitsandbytes`                        |
+| **Quantization Config**       | 4-bit (nf4), `bfloat16` compute dtype |
+| **Compute Dtype**             | `float16`                             |
+| **Load In 4bit**              | ✅ Yes                                 |
+| **Load In 8bit**              | ❌ No                                  |
+| **Tie Word Embeddings**       | False                                 |
+| **Is Encoder-Decoder**        | False                                 |
+| **BOS Token ID**              | 1                                     |
+| **EOS Token ID**              | 2                                     |
+| **Pad Token ID**              | None                                  |
+| **Generation Settings**       |                                       |
+| → Max Length                  | 20                                    |
+| → Min Length                  | 0                                     |
+| → Temperature                 | 1.0                                   |
+| → Top-k                       | 50                                    |
+| → Top-p                       | 1.0                                   |
+| → Num Beams                   | 1                                     |
+| → Repetition Penalty          | 1.0                                   |
+| → Early Stopping              | False                                 |
+| **ID → Label Map**            | {0: `LABEL_0`, 1: `LABEL_1`}          |
+| **Label → ID Map**            | {'LABEL_0': 0, 'LABEL_1': 1}          |
+| **Training Framework**        | Transformers v4.57.1                  |
+| **Quant Library**             | bitsandbytes                          |
+| **Local Path / Repo**         | `./msci_software_engineering_slm_v1`  |
+## Quantization
+| **Parameter**               | **Value**      |
+| --------------------------- | -------------- |
+| `_load_in_4bit`             | True           |
+| `_load_in_8bit`             | False          |
+| `bnb_4bit_compute_dtype`    | `bfloat16`     |
+| `bnb_4bit_quant_storage`    | `uint8`        |
+| `bnb_4bit_quant_type`       | `nf4`          |
+| `bnb_4bit_use_double_quant` | False          |
+| `load_in_4bit`              | True           |
+| `load_in_8bit`              | False          |
+| `quant_method`              | `bitsandbytes` |
 ## Training Data