Model Card for Lumo

Lumo is a lightweight conversational AI adapter fine-tuned using QLoRA on top of the open-source TinyLLaMA 1.1B Chat base model. It is designed for learning, experimentation, and student projects, with a focus on accessibility and transparency.

Note: This repository contains only the LoRA adapter weights, not the base model.

Model Details

Model Description

Developed by: Aditya Verma
Model type: Conversational Language Model (LoRA Adapter)
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model: TinyLlama/TinyLlama-1.1B-Chat-v1.0

Model Sources

Repository: Adi362/Lumo
Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Training Framework: Hugging Face Transformers + PEFT

Uses

Direct Use

This model is intended for:

Local conversational chatbots
Educational AI experiments
Student projects involving LLMs
Learning how LoRA fine-tuning works
Prototyping lightweight AI assistants

The adapter must be loaded together with the base TinyLLaMA model.

Downstream Use

The adapter can be:

Combined with other LoRA adapters
Further fine-tuned on domain-specific datasets
Integrated into APIs or applications
Used as a base for research or experimentation

Out-of-Scope Use

This model is not intended for:

High-stakes decision making
Medical, legal, or financial advice
Production-grade commercial systems without further evaluation
Safety-critical applications

Bias, Risks, and Limitations

Bias: The model may reflect biases present in the training data (OpenAssistant).
Hallucinations: It can produce incorrect or misleading information.
Factuality: Responses should not be treated as factual guarantees.
Performance: Capabilities are limited by the small size (1.1B parameters) and scope of the base model.

Recommendations

Users (both direct and downstream) should:

Validate outputs independently.
Avoid using the model for critical applications.
Apply additional safety layers when deploying in public-facing systems.

How to Get Started with the Model

Use the code below to load the base model and the Lumo adapter.

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

BASE_MODEL = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
LORA_MODEL = "Adi362/Lumo"

# 1. Load Base Model
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float32,
    device_map=None
)

# 2. Load Lumo Adapter
model = PeftModel.from_pretrained(model, LORA_MODEL)
model.eval()

## Training Details

### Training Data

The model was trained on a filtered subset of the **OpenAssistant Conversations** dataset.

- **Dataset Name:** OpenAssistant Conversations (English, filtered)
- **Data Type:** Human–assistant dialogue pairs
- **Content:** Diverse conversational topics, instructions, and queries.

### Training Procedure

#### Preprocessing

The dataset underwent the following preprocessing steps:
- **Filtering:** Retained only English language conversations.
- **Formatting:** Constructed user–assistant pairs and formatted them using standard chat-style prompts to suit the base model's expectations.

#### Training Hyperparameters

- **Training regime:** **QLoRA** (4-bit base model quantization + LoRA adapters)
- **Precision:** 4-bit (nf4)
- **Optimizer:** Paged AdamW (8-bit)
- **Learning Rate:** 2e-4
- **Epochs:** 2
- **Batch Size:** 1 (with gradient accumulation)
- **Trainable Parameters:** ~1.1% of total model parameters

#### Speeds, Sizes, Times

- **Training Time:** ~4–5 hours on a single GPU.

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

No formal benchmark datasets were used for this version. The model is intended for educational purposes and low-stakes experimentation.

#### Factors

Evaluation focused on:
- **Language:** English only.
- **Domain:** General conversational ability and basic instruction following.

#### Metrics

Evaluation was qualitative, focusing on:
1.  **Coherence:** Ability to maintain a conversation flow.
2.  **Instruction Following:** Ability to execute simple prompts.
3.  **Identity:** Correctly identifying itself as an AI assistant.

### Results

The model demonstrates basic conversational fluency and can handle simple instructions. As a lightweight adapter (~1.1B parameters), it may struggle with complex reasoning or highly specific factual queries compared to larger models.

## Model Examination

*Not applicable for this version.*

## Environmental Impact

Carbon emissions were estimated based on the training hardware and duration.

- **Hardware Type:** NVIDIA Tesla T4 (Cloud GPU)
- **Hours used:** ~4-5 hours
- **Cloud Provider:** Google Colab
- **Compute Region:** Unknown (Colab default)
- **Carbon Emitted:** Negligible (Low-scale training not formally measured).

## Technical Specifications

### Model Architecture and Objective

- **Base Architecture:** Transformer (TinyLLaMA 1.1B)
- **Adaptation Method:** Low-Rank Adaptation (LoRA)
- **Objective:** Causal Language Modeling (Next-token prediction)

### Compute Infrastructure

#### Hardware

- **GPU:** Single NVIDIA Tesla T4 (16GB VRAM)

#### Software

- **Orchestration:** Google Colab
- **Libraries:** Hugging Face Transformers, PEFT, PyTorch, BitsAndBytes

## Citation

**BibTeX:**

```bibtex
@misc{verma2025lumo,
  author = {Verma, Aditya},
  title = {Lumo: A LoRA-fine-tuned conversational adapter based on TinyLLaMA},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{[https://huggingface.co/Adi362/Lumo](https://huggingface.co/Adi362/Lumo)}}
}

**APA:**

> Verma, A. (2025). *Lumo: A LoRA-fine-tuned conversational adapter based on TinyLLaMA* [Large Language Model]. Hugging Face. https://huggingface.co/Adi362/Lumo

## Glossary

* **LoRA (Low-Rank Adaptation):** A parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer, significantly reducing the number of trainable parameters.
* **QLoRA (Quantized LoRA):** An efficient fine-tuning approach that quantizes the base model to 4-bit precision (reducing memory usage) while keeping the LoRA adapters in higher precision for training.
* **PEFT (Parameter-Efficient Fine-Tuning):** A library by Hugging Face that enables efficient adaptation of pre-trained language models to various downstream applications without fine-tuning all the model's parameters.
* **TinyLlama:** A compact 1.1 billion parameter language model pre-trained on around 1 trillion tokens, designed to be run on edge devices and consumer hardware.

## More Information

This model was created as a student project to demonstrate the feasibility of fine-tuning valid conversational assistants on consumer-grade hardware (Google Colab free tier) using the QLoRA technique.

## Model Card Authors

Aditya Verma

## Model Card Contact

For bugs, feature requests, or general feedback, please open an issue on the [Project GitHub Repository](https://github.com/Adi362/Lumo) or the Hugging Face Community tab.

### Framework versions

- PEFT 0.8.2

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Adi362/Lumo

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Adapter

(1492)

this model