--- base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 library_name: peft license: apache-2.0 tags: - conversational-ai - chatbot - lora - qlora - peft - nlp - openassistant - fine-tuning --- # Model Card for Lumo **Lumo** is a lightweight conversational AI adapter fine-tuned using **QLoRA** on top of the open-source **TinyLLaMA 1.1B Chat** base model. It is designed for **learning, experimentation, and student projects**, with a focus on accessibility and transparency. **Note:** This repository contains **only the LoRA adapter weights**, not the base model. ## Model Details ### Model Description - **Developed by:** Aditya Verma - **Model type:** Conversational Language Model (LoRA Adapter) - **Language(s) (NLP):** English - **License:** Apache 2.0 - **Finetuned from model:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) ### Model Sources - **Repository:** Adi362/Lumo - **Base Model:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) - **Training Framework:** Hugging Face Transformers + PEFT ## Uses ### Direct Use This model is intended for: - Local conversational chatbots - Educational AI experiments - Student projects involving LLMs - Learning how LoRA fine-tuning works - Prototyping lightweight AI assistants *The adapter must be loaded together with the base TinyLLaMA model.* ### Downstream Use The adapter can be: - Combined with other LoRA adapters - Further fine-tuned on domain-specific datasets - Integrated into APIs or applications - Used as a base for research or experimentation ### Out-of-Scope Use This model is **not intended** for: - High-stakes decision making - Medical, legal, or financial advice - Production-grade commercial systems without further evaluation - Safety-critical applications ## Bias, Risks, and Limitations - **Bias:** The model may reflect biases present in the training data (OpenAssistant). - **Hallucinations:** It can produce incorrect or misleading information. - **Factuality:** Responses should not be treated as factual guarantees. - **Performance:** Capabilities are limited by the small size (1.1B parameters) and scope of the base model. ### Recommendations Users (both direct and downstream) should: - Validate outputs independently. - Avoid using the model for critical applications. - Apply additional safety layers when deploying in public-facing systems. ## How to Get Started with the Model Use the code below to load the base model and the Lumo adapter. ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel import torch BASE_MODEL = "TinyLlama/TinyLlama-1.1B-Chat-v1.0" LORA_MODEL = "Adi362/Lumo" # 1. Load Base Model tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL) model = AutoModelForCausalLM.from_pretrained( BASE_MODEL, torch_dtype=torch.float32, device_map=None ) # 2. Load Lumo Adapter model = PeftModel.from_pretrained(model, LORA_MODEL) model.eval() ## Training Details ### Training Data The model was trained on a filtered subset of the **OpenAssistant Conversations** dataset. - **Dataset Name:** OpenAssistant Conversations (English, filtered) - **Data Type:** Human–assistant dialogue pairs - **Content:** Diverse conversational topics, instructions, and queries. ### Training Procedure #### Preprocessing The dataset underwent the following preprocessing steps: - **Filtering:** Retained only English language conversations. - **Formatting:** Constructed user–assistant pairs and formatted them using standard chat-style prompts to suit the base model's expectations. #### Training Hyperparameters - **Training regime:** **QLoRA** (4-bit base model quantization + LoRA adapters) - **Precision:** 4-bit (nf4) - **Optimizer:** Paged AdamW (8-bit) - **Learning Rate:** 2e-4 - **Epochs:** 2 - **Batch Size:** 1 (with gradient accumulation) - **Trainable Parameters:** ~1.1% of total model parameters #### Speeds, Sizes, Times - **Training Time:** ~4–5 hours on a single GPU. ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data No formal benchmark datasets were used for this version. The model is intended for educational purposes and low-stakes experimentation. #### Factors Evaluation focused on: - **Language:** English only. - **Domain:** General conversational ability and basic instruction following. #### Metrics Evaluation was qualitative, focusing on: 1. **Coherence:** Ability to maintain a conversation flow. 2. **Instruction Following:** Ability to execute simple prompts. 3. **Identity:** Correctly identifying itself as an AI assistant. ### Results The model demonstrates basic conversational fluency and can handle simple instructions. As a lightweight adapter (~1.1B parameters), it may struggle with complex reasoning or highly specific factual queries compared to larger models. ## Model Examination *Not applicable for this version.* ## Environmental Impact Carbon emissions were estimated based on the training hardware and duration. - **Hardware Type:** NVIDIA Tesla T4 (Cloud GPU) - **Hours used:** ~4-5 hours - **Cloud Provider:** Google Colab - **Compute Region:** Unknown (Colab default) - **Carbon Emitted:** Negligible (Low-scale training not formally measured). ## Technical Specifications ### Model Architecture and Objective - **Base Architecture:** Transformer (TinyLLaMA 1.1B) - **Adaptation Method:** Low-Rank Adaptation (LoRA) - **Objective:** Causal Language Modeling (Next-token prediction) ### Compute Infrastructure #### Hardware - **GPU:** Single NVIDIA Tesla T4 (16GB VRAM) #### Software - **Orchestration:** Google Colab - **Libraries:** Hugging Face Transformers, PEFT, PyTorch, BitsAndBytes ## Citation **BibTeX:** ```bibtex @misc{verma2025lumo, author = {Verma, Aditya}, title = {Lumo: A LoRA-fine-tuned conversational adapter based on TinyLLaMA}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{[https://huggingface.co/Adi362/Lumo](https://huggingface.co/Adi362/Lumo)}} } **APA:** > Verma, A. (2025). *Lumo: A LoRA-fine-tuned conversational adapter based on TinyLLaMA* [Large Language Model]. Hugging Face. https://huggingface.co/Adi362/Lumo ## Glossary * **LoRA (Low-Rank Adaptation):** A parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer, significantly reducing the number of trainable parameters. * **QLoRA (Quantized LoRA):** An efficient fine-tuning approach that quantizes the base model to 4-bit precision (reducing memory usage) while keeping the LoRA adapters in higher precision for training. * **PEFT (Parameter-Efficient Fine-Tuning):** A library by Hugging Face that enables efficient adaptation of pre-trained language models to various downstream applications without fine-tuning all the model's parameters. * **TinyLlama:** A compact 1.1 billion parameter language model pre-trained on around 1 trillion tokens, designed to be run on edge devices and consumer hardware. ## More Information This model was created as a student project to demonstrate the feasibility of fine-tuning valid conversational assistants on consumer-grade hardware (Google Colab free tier) using the QLoRA technique. ## Model Card Authors Aditya Verma ## Model Card Contact For bugs, feature requests, or general feedback, please open an issue on the [Project GitHub Repository](https://github.com/Adi362/Lumo) or the Hugging Face Community tab. ### Framework versions - PEFT 0.8.2