File size: 7,585 Bytes
b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a 51c5cbd b88ff1a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
---
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
library_name: peft
license: apache-2.0
tags:
- conversational-ai
- chatbot
- lora
- qlora
- peft
- nlp
- openassistant
- fine-tuning
---
# Model Card for Lumo
**Lumo** is a lightweight conversational AI adapter fine-tuned using **QLoRA** on top of the open-source **TinyLLaMA 1.1B Chat** base model. It is designed for **learning, experimentation, and student projects**, with a focus on accessibility and transparency.
**Note:** This repository contains **only the LoRA adapter weights**, not the base model.
## Model Details
### Model Description
- **Developed by:** Aditya Verma
- **Model type:** Conversational Language Model (LoRA Adapter)
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
### Model Sources
- **Repository:** Adi362/Lumo
- **Base Model:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
- **Training Framework:** Hugging Face Transformers + PEFT
## Uses
### Direct Use
This model is intended for:
- Local conversational chatbots
- Educational AI experiments
- Student projects involving LLMs
- Learning how LoRA fine-tuning works
- Prototyping lightweight AI assistants
*The adapter must be loaded together with the base TinyLLaMA model.*
### Downstream Use
The adapter can be:
- Combined with other LoRA adapters
- Further fine-tuned on domain-specific datasets
- Integrated into APIs or applications
- Used as a base for research or experimentation
### Out-of-Scope Use
This model is **not intended** for:
- High-stakes decision making
- Medical, legal, or financial advice
- Production-grade commercial systems without further evaluation
- Safety-critical applications
## Bias, Risks, and Limitations
- **Bias:** The model may reflect biases present in the training data (OpenAssistant).
- **Hallucinations:** It can produce incorrect or misleading information.
- **Factuality:** Responses should not be treated as factual guarantees.
- **Performance:** Capabilities are limited by the small size (1.1B parameters) and scope of the base model.
### Recommendations
Users (both direct and downstream) should:
- Validate outputs independently.
- Avoid using the model for critical applications.
- Apply additional safety layers when deploying in public-facing systems.
## How to Get Started with the Model
Use the code below to load the base model and the Lumo adapter.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
BASE_MODEL = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
LORA_MODEL = "Adi362/Lumo"
# 1. Load Base Model
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL,
torch_dtype=torch.float32,
device_map=None
)
# 2. Load Lumo Adapter
model = PeftModel.from_pretrained(model, LORA_MODEL)
model.eval()
## Training Details
### Training Data
The model was trained on a filtered subset of the **OpenAssistant Conversations** dataset.
- **Dataset Name:** OpenAssistant Conversations (English, filtered)
- **Data Type:** Human–assistant dialogue pairs
- **Content:** Diverse conversational topics, instructions, and queries.
### Training Procedure
#### Preprocessing
The dataset underwent the following preprocessing steps:
- **Filtering:** Retained only English language conversations.
- **Formatting:** Constructed user–assistant pairs and formatted them using standard chat-style prompts to suit the base model's expectations.
#### Training Hyperparameters
- **Training regime:** **QLoRA** (4-bit base model quantization + LoRA adapters)
- **Precision:** 4-bit (nf4)
- **Optimizer:** Paged AdamW (8-bit)
- **Learning Rate:** 2e-4
- **Epochs:** 2
- **Batch Size:** 1 (with gradient accumulation)
- **Trainable Parameters:** ~1.1% of total model parameters
#### Speeds, Sizes, Times
- **Training Time:** ~4–5 hours on a single GPU.
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
No formal benchmark datasets were used for this version. The model is intended for educational purposes and low-stakes experimentation.
#### Factors
Evaluation focused on:
- **Language:** English only.
- **Domain:** General conversational ability and basic instruction following.
#### Metrics
Evaluation was qualitative, focusing on:
1. **Coherence:** Ability to maintain a conversation flow.
2. **Instruction Following:** Ability to execute simple prompts.
3. **Identity:** Correctly identifying itself as an AI assistant.
### Results
The model demonstrates basic conversational fluency and can handle simple instructions. As a lightweight adapter (~1.1B parameters), it may struggle with complex reasoning or highly specific factual queries compared to larger models.
## Model Examination
*Not applicable for this version.*
## Environmental Impact
Carbon emissions were estimated based on the training hardware and duration.
- **Hardware Type:** NVIDIA Tesla T4 (Cloud GPU)
- **Hours used:** ~4-5 hours
- **Cloud Provider:** Google Colab
- **Compute Region:** Unknown (Colab default)
- **Carbon Emitted:** Negligible (Low-scale training not formally measured).
## Technical Specifications
### Model Architecture and Objective
- **Base Architecture:** Transformer (TinyLLaMA 1.1B)
- **Adaptation Method:** Low-Rank Adaptation (LoRA)
- **Objective:** Causal Language Modeling (Next-token prediction)
### Compute Infrastructure
#### Hardware
- **GPU:** Single NVIDIA Tesla T4 (16GB VRAM)
#### Software
- **Orchestration:** Google Colab
- **Libraries:** Hugging Face Transformers, PEFT, PyTorch, BitsAndBytes
## Citation
**BibTeX:**
```bibtex
@misc{verma2025lumo,
author = {Verma, Aditya},
title = {Lumo: A LoRA-fine-tuned conversational adapter based on TinyLLaMA},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{[https://huggingface.co/Adi362/Lumo](https://huggingface.co/Adi362/Lumo)}}
}
**APA:**
> Verma, A. (2025). *Lumo: A LoRA-fine-tuned conversational adapter based on TinyLLaMA* [Large Language Model]. Hugging Face. https://huggingface.co/Adi362/Lumo
## Glossary
* **LoRA (Low-Rank Adaptation):** A parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer, significantly reducing the number of trainable parameters.
* **QLoRA (Quantized LoRA):** An efficient fine-tuning approach that quantizes the base model to 4-bit precision (reducing memory usage) while keeping the LoRA adapters in higher precision for training.
* **PEFT (Parameter-Efficient Fine-Tuning):** A library by Hugging Face that enables efficient adaptation of pre-trained language models to various downstream applications without fine-tuning all the model's parameters.
* **TinyLlama:** A compact 1.1 billion parameter language model pre-trained on around 1 trillion tokens, designed to be run on edge devices and consumer hardware.
## More Information
This model was created as a student project to demonstrate the feasibility of fine-tuning valid conversational assistants on consumer-grade hardware (Google Colab free tier) using the QLoRA technique.
## Model Card Authors
Aditya Verma
## Model Card Contact
For bugs, feature requests, or general feedback, please open an issue on the [Project GitHub Repository](https://github.com/Adi362/Lumo) or the Hugging Face Community tab.
### Framework versions
- PEFT 0.8.2 |