Update README.md

Browse files

Files changed (1) hide show

README.md +106 -3

README.md CHANGED Viewed

@@ -1,3 +1,106 @@
----
-license: apache-2.0
----

+---
+{
+  "language": ["en"],
+  "license": "apache-2.0",
+  "tags": [
+    "text-generation",
+    "causal-lm",
+    "continual-pretraining",
+    "lora",
+    "axolotl",
+    "deepspeed",
+    "transformers",
+    "mistral",
+    "nemo",
+    "eu-hpc"
+  ],
+  "datasets": ["arxiv", "gov", "news", "wikipedia"],
+  "metrics": ["loss"],
+  "library_name": "transformers",
+  "framework": "pytorch",
+  "base_model": "mistralai/Mistral-Nemo-Instruct-2407",
+  "model_name": "mistral-12b-cpt",
+  "pipeline_tag": "text-generation",
+  "task_categories": ["text-generation"],
+  "model_type": "AutoModelForCausalLM",
+  "inference": {
+    "parameters": {
+      "max_new_tokens": 512,
+      "temperature": 0.7,
+      "top_p": 0.9
+    }
+  },
+  "trained_on": ["Leonardo EuroHPC"],
+  "description": "Continual pretraining (CPT) of Mistral 12B Nemo Instruct using Axolotl and DeepSpeed ZeRO-1. Trained on scientific, government, news, and Wikipedia text with LoRA adapters."
+}
+---
+# Mistral 12B — CPT (Continual Pretraining with LoRA)
+**Model type:** Causal Language Model
+**Base model:** [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)
+**License:** Apache 2.0
+**Framework:** [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
+---
+## Overview
+`mistral-12b-cpt` is a **continual-pretrained** version of the Mistral-12B Nemo Instruct model.
+This CPT phase extends the model’s factual and energy domain understanding using scientific, governmental, news, and encyclopedic text.
+Training was executed on the **Leonardo EuroHPC** system using Axolotl with DeepSpeed ZeRO-1 for efficient large-scale distributed fine-tuning.
+---
+## Training Setup
+**Objective:** Unsupervised continual pretraining (language modeling)
+**Adapter type:** LoRA
+**Precision:** bfloat16
+**Hardware:** 8 nodes × 2 × NVIDIA A100 64 GB GPUs
+**Framework:** Axolotl + DeepSpeed + PyTorch 2.5.1 + CUDA 12.1
+**Runtime:** 24 h
+**Checkpoints:** 5 per epoch
+---
+## Dataset
+| Dataset | Description |
+|----------|-------------|
+| `arxiv.jsonl` | Scientific and technical papers |
+| `gov.jsonl` | Government and policy documents |
+| `news.jsonl` | News articles |
+| `wiki.jsonl` | Wikipedia text |
+---
+## Hyperparameters
+| Parameter | Value |
+|------------|-------|
+| Sequence length | 2048 |
+| Micro batch size | 2 |
+| Gradient accumulation | 2 |
+| Epochs | 10 |
+| Max steps | 10000 |
+| Learning rate | 0.0002 |
+| LR scheduler | cosine |
+| Optimizer | AdamW (8-bit) |
+| Warmup steps | 10 |
+| Weight decay | 0.0 |
+| LoRA rank (r) | 16 |
+| LoRA alpha | 32 |
+| LoRA dropout | 0.05 |
+| LoRA targets | q_proj, k_proj, v_proj, o_proj |
+| Gradient checkpointing | ✅ |
+| Flash attention | ✅ |
+| Loss watchdog (threshold/patience) | 5.0 / 3 |
+---
+## Tokenizer
+**Tokenizer type:** `AutoTokenizer`
+**Pad token:** `<|end_of_text|>`