Lumo

File size: 7,585 Bytes

b88ff1a
 
 
51c5cbd
 
 
 
 
 
 
 
 
 
b88ff1a
 
51c5cbd
b88ff1a
51c5cbd
b88ff1a
51c5cbd
b88ff1a
 
 
 
 
51c5cbd
 
 
 
 
b88ff1a
51c5cbd
b88ff1a
51c5cbd
 
 
b88ff1a
 
 
 
 
51c5cbd
 
 
 
 
 
b88ff1a
51c5cbd
b88ff1a
51c5cbd
b88ff1a
51c5cbd
 
 
 
 
b88ff1a
 
 
51c5cbd
 
 
 
 
b88ff1a
 
 
51c5cbd
 
 
 
b88ff1a
 
 
51c5cbd
 
 
 
b88ff1a
 
 
51c5cbd
 
 
 
 
 
 
 
 
b88ff1a
51c5cbd
 
 
 
 
 
 
 
 
 
 
b88ff1a
 
 
 
 
51c5cbd
b88ff1a
51c5cbd
 
 
b88ff1a
 
 
51c5cbd
b88ff1a
51c5cbd
 
 
b88ff1a
 
 
51c5cbd
 
 
 
 
 
 
b88ff1a
51c5cbd
b88ff1a
51c5cbd
b88ff1a
 
 
 
 
 
 
51c5cbd
b88ff1a
 
 
51c5cbd
 
 
b88ff1a
 
 
51c5cbd
 
 
 
b88ff1a
 
 
51c5cbd
b88ff1a
51c5cbd
b88ff1a
51c5cbd
b88ff1a
 
 
51c5cbd
b88ff1a
51c5cbd
 
 
 
 
b88ff1a
51c5cbd
b88ff1a
 
 
51c5cbd
 
 
b88ff1a
 
 
 
 
51c5cbd
b88ff1a
 
 
51c5cbd
 
b88ff1a
51c5cbd
b88ff1a
 
 
51c5cbd
 
 
 
 
 
 
 
b88ff1a
 
 
51c5cbd
b88ff1a
51c5cbd
b88ff1a
51c5cbd
 
 
 
b88ff1a
51c5cbd
b88ff1a
51c5cbd
b88ff1a
51c5cbd
b88ff1a
51c5cbd
b88ff1a
 
 
51c5cbd
 
b88ff1a

---
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
library_name: peft
license: apache-2.0
tags:
- conversational-ai
- chatbot
- lora
- qlora
- peft
- nlp
- openassistant
- fine-tuning
---

# Model Card for Lumo

**Lumo** is a lightweight conversational AI adapter fine-tuned using **QLoRA** on top of the open-source **TinyLLaMA 1.1B Chat** base model. It is designed for **learning, experimentation, and student projects**, with a focus on accessibility and transparency.

**Note:** This repository contains **only the LoRA adapter weights**, not the base model.

## Model Details

### Model Description

- **Developed by:** Aditya Verma
- **Model type:** Conversational Language Model (LoRA Adapter)
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)

### Model Sources

- **Repository:** Adi362/Lumo
- **Base Model:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
- **Training Framework:** Hugging Face Transformers + PEFT

## Uses

### Direct Use

This model is intended for:
- Local conversational chatbots
- Educational AI experiments
- Student projects involving LLMs
- Learning how LoRA fine-tuning works
- Prototyping lightweight AI assistants

*The adapter must be loaded together with the base TinyLLaMA model.*

### Downstream Use

The adapter can be:
- Combined with other LoRA adapters
- Further fine-tuned on domain-specific datasets
- Integrated into APIs or applications
- Used as a base for research or experimentation

### Out-of-Scope Use

This model is **not intended** for:
- High-stakes decision making
- Medical, legal, or financial advice
- Production-grade commercial systems without further evaluation
- Safety-critical applications

## Bias, Risks, and Limitations

- **Bias:** The model may reflect biases present in the training data (OpenAssistant).
- **Hallucinations:** It can produce incorrect or misleading information.
- **Factuality:** Responses should not be treated as factual guarantees.
- **Performance:** Capabilities are limited by the small size (1.1B parameters) and scope of the base model.

### Recommendations

Users (both direct and downstream) should:
- Validate outputs independently.
- Avoid using the model for critical applications.
- Apply additional safety layers when deploying in public-facing systems.

## How to Get Started with the Model

Use the code below to load the base model and the Lumo adapter.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

BASE_MODEL = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
LORA_MODEL = "Adi362/Lumo"

# 1. Load Base Model
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float32,
    device_map=None
)

# 2. Load Lumo Adapter
model = PeftModel.from_pretrained(model, LORA_MODEL)
model.eval()

## Training Details

### Training Data

The model was trained on a filtered subset of the **OpenAssistant Conversations** dataset.

- **Dataset Name:** OpenAssistant Conversations (English, filtered)
- **Data Type:** Human–assistant dialogue pairs
- **Content:** Diverse conversational topics, instructions, and queries.

### Training Procedure

#### Preprocessing

The dataset underwent the following preprocessing steps:
- **Filtering:** Retained only English language conversations.
- **Formatting:** Constructed user–assistant pairs and formatted them using standard chat-style prompts to suit the base model's expectations.

#### Training Hyperparameters

- **Training regime:** **QLoRA** (4-bit base model quantization + LoRA adapters)
- **Precision:** 4-bit (nf4)
- **Optimizer:** Paged AdamW (8-bit)
- **Learning Rate:** 2e-4
- **Epochs:** 2
- **Batch Size:** 1 (with gradient accumulation)
- **Trainable Parameters:** ~1.1% of total model parameters

#### Speeds, Sizes, Times

- **Training Time:** ~4–5 hours on a single GPU.

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

No formal benchmark datasets were used for this version. The model is intended for educational purposes and low-stakes experimentation.

#### Factors

Evaluation focused on:
- **Language:** English only.
- **Domain:** General conversational ability and basic instruction following.

#### Metrics

Evaluation was qualitative, focusing on:
1.  **Coherence:** Ability to maintain a conversation flow.
2.  **Instruction Following:** Ability to execute simple prompts.
3.  **Identity:** Correctly identifying itself as an AI assistant.

### Results

The model demonstrates basic conversational fluency and can handle simple instructions. As a lightweight adapter (~1.1B parameters), it may struggle with complex reasoning or highly specific factual queries compared to larger models.

## Model Examination

*Not applicable for this version.*

## Environmental Impact

Carbon emissions were estimated based on the training hardware and duration.

- **Hardware Type:** NVIDIA Tesla T4 (Cloud GPU)
- **Hours used:** ~4-5 hours
- **Cloud Provider:** Google Colab
- **Compute Region:** Unknown (Colab default)
- **Carbon Emitted:** Negligible (Low-scale training not formally measured).

## Technical Specifications

### Model Architecture and Objective

- **Base Architecture:** Transformer (TinyLLaMA 1.1B)
- **Adaptation Method:** Low-Rank Adaptation (LoRA)
- **Objective:** Causal Language Modeling (Next-token prediction)

### Compute Infrastructure

#### Hardware

- **GPU:** Single NVIDIA Tesla T4 (16GB VRAM)

#### Software

- **Orchestration:** Google Colab
- **Libraries:** Hugging Face Transformers, PEFT, PyTorch, BitsAndBytes

## Citation

**BibTeX:**

```bibtex
@misc{verma2025lumo,
  author = {Verma, Aditya},
  title = {Lumo: A LoRA-fine-tuned conversational adapter based on TinyLLaMA},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{[https://huggingface.co/Adi362/Lumo](https://huggingface.co/Adi362/Lumo)}}
}

**APA:**

> Verma, A. (2025). *Lumo: A LoRA-fine-tuned conversational adapter based on TinyLLaMA* [Large Language Model]. Hugging Face. https://huggingface.co/Adi362/Lumo

## Glossary

* **LoRA (Low-Rank Adaptation):** A parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer, significantly reducing the number of trainable parameters.
* **QLoRA (Quantized LoRA):** An efficient fine-tuning approach that quantizes the base model to 4-bit precision (reducing memory usage) while keeping the LoRA adapters in higher precision for training.
* **PEFT (Parameter-Efficient Fine-Tuning):** A library by Hugging Face that enables efficient adaptation of pre-trained language models to various downstream applications without fine-tuning all the model's parameters.
* **TinyLlama:** A compact 1.1 billion parameter language model pre-trained on around 1 trillion tokens, designed to be run on edge devices and consumer hardware.

## More Information

This model was created as a student project to demonstrate the feasibility of fine-tuning valid conversational assistants on consumer-grade hardware (Google Colab free tier) using the QLoRA technique.

## Model Card Authors

Aditya Verma

## Model Card Contact

For bugs, feature requests, or general feedback, please open an issue on the [Project GitHub Repository](https://github.com/Adi362/Lumo) or the Hugging Face Community tab.

### Framework versions

- PEFT 0.8.2