Model Card for Lumo
Lumo is a lightweight conversational AI adapter fine-tuned using QLoRA on top of the open-source TinyLLaMA 1.1B Chat base model. It is designed for learning, experimentation, and student projects, with a focus on accessibility and transparency.
Note: This repository contains only the LoRA adapter weights, not the base model.
Model Details
Model Description
- Developed by: Aditya Verma
- Model type: Conversational Language Model (LoRA Adapter)
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Model Sources
- Repository: Adi362/Lumo
- Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
- Training Framework: Hugging Face Transformers + PEFT
Uses
Direct Use
This model is intended for:
- Local conversational chatbots
- Educational AI experiments
- Student projects involving LLMs
- Learning how LoRA fine-tuning works
- Prototyping lightweight AI assistants
The adapter must be loaded together with the base TinyLLaMA model.
Downstream Use
The adapter can be:
- Combined with other LoRA adapters
- Further fine-tuned on domain-specific datasets
- Integrated into APIs or applications
- Used as a base for research or experimentation
Out-of-Scope Use
This model is not intended for:
- High-stakes decision making
- Medical, legal, or financial advice
- Production-grade commercial systems without further evaluation
- Safety-critical applications
Bias, Risks, and Limitations
- Bias: The model may reflect biases present in the training data (OpenAssistant).
- Hallucinations: It can produce incorrect or misleading information.
- Factuality: Responses should not be treated as factual guarantees.
- Performance: Capabilities are limited by the small size (1.1B parameters) and scope of the base model.
Recommendations
Users (both direct and downstream) should:
- Validate outputs independently.
- Avoid using the model for critical applications.
- Apply additional safety layers when deploying in public-facing systems.
How to Get Started with the Model
Use the code below to load the base model and the Lumo adapter.
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
BASE_MODEL = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
LORA_MODEL = "Adi362/Lumo"
# 1. Load Base Model
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL,
torch_dtype=torch.float32,
device_map=None
)
# 2. Load Lumo Adapter
model = PeftModel.from_pretrained(model, LORA_MODEL)
model.eval()
## Training Details
### Training Data
The model was trained on a filtered subset of the **OpenAssistant Conversations** dataset.
- **Dataset Name:** OpenAssistant Conversations (English, filtered)
- **Data Type:** Human–assistant dialogue pairs
- **Content:** Diverse conversational topics, instructions, and queries.
### Training Procedure
#### Preprocessing
The dataset underwent the following preprocessing steps:
- **Filtering:** Retained only English language conversations.
- **Formatting:** Constructed user–assistant pairs and formatted them using standard chat-style prompts to suit the base model's expectations.
#### Training Hyperparameters
- **Training regime:** **QLoRA** (4-bit base model quantization + LoRA adapters)
- **Precision:** 4-bit (nf4)
- **Optimizer:** Paged AdamW (8-bit)
- **Learning Rate:** 2e-4
- **Epochs:** 2
- **Batch Size:** 1 (with gradient accumulation)
- **Trainable Parameters:** ~1.1% of total model parameters
#### Speeds, Sizes, Times
- **Training Time:** ~4–5 hours on a single GPU.
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
No formal benchmark datasets were used for this version. The model is intended for educational purposes and low-stakes experimentation.
#### Factors
Evaluation focused on:
- **Language:** English only.
- **Domain:** General conversational ability and basic instruction following.
#### Metrics
Evaluation was qualitative, focusing on:
1. **Coherence:** Ability to maintain a conversation flow.
2. **Instruction Following:** Ability to execute simple prompts.
3. **Identity:** Correctly identifying itself as an AI assistant.
### Results
The model demonstrates basic conversational fluency and can handle simple instructions. As a lightweight adapter (~1.1B parameters), it may struggle with complex reasoning or highly specific factual queries compared to larger models.
## Model Examination
*Not applicable for this version.*
## Environmental Impact
Carbon emissions were estimated based on the training hardware and duration.
- **Hardware Type:** NVIDIA Tesla T4 (Cloud GPU)
- **Hours used:** ~4-5 hours
- **Cloud Provider:** Google Colab
- **Compute Region:** Unknown (Colab default)
- **Carbon Emitted:** Negligible (Low-scale training not formally measured).
## Technical Specifications
### Model Architecture and Objective
- **Base Architecture:** Transformer (TinyLLaMA 1.1B)
- **Adaptation Method:** Low-Rank Adaptation (LoRA)
- **Objective:** Causal Language Modeling (Next-token prediction)
### Compute Infrastructure
#### Hardware
- **GPU:** Single NVIDIA Tesla T4 (16GB VRAM)
#### Software
- **Orchestration:** Google Colab
- **Libraries:** Hugging Face Transformers, PEFT, PyTorch, BitsAndBytes
## Citation
**BibTeX:**
```bibtex
@misc{verma2025lumo,
author = {Verma, Aditya},
title = {Lumo: A LoRA-fine-tuned conversational adapter based on TinyLLaMA},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{[https://huggingface.co/Adi362/Lumo](https://huggingface.co/Adi362/Lumo)}}
}
**APA:**
> Verma, A. (2025). *Lumo: A LoRA-fine-tuned conversational adapter based on TinyLLaMA* [Large Language Model]. Hugging Face. https://huggingface.co/Adi362/Lumo
## Glossary
* **LoRA (Low-Rank Adaptation):** A parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer, significantly reducing the number of trainable parameters.
* **QLoRA (Quantized LoRA):** An efficient fine-tuning approach that quantizes the base model to 4-bit precision (reducing memory usage) while keeping the LoRA adapters in higher precision for training.
* **PEFT (Parameter-Efficient Fine-Tuning):** A library by Hugging Face that enables efficient adaptation of pre-trained language models to various downstream applications without fine-tuning all the model's parameters.
* **TinyLlama:** A compact 1.1 billion parameter language model pre-trained on around 1 trillion tokens, designed to be run on edge devices and consumer hardware.
## More Information
This model was created as a student project to demonstrate the feasibility of fine-tuning valid conversational assistants on consumer-grade hardware (Google Colab free tier) using the QLoRA technique.
## Model Card Authors
Aditya Verma
## Model Card Contact
For bugs, feature requests, or general feedback, please open an issue on the [Project GitHub Repository](https://github.com/Adi362/Lumo) or the Hugging Face Community tab.
### Framework versions
- PEFT 0.8.2
- Downloads last month
- 54
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Adi362/Lumo
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0