Lumo / README.md

Update README.md

51c5cbd verified 7 days ago

7.59 kB

	---
	base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	library_name: peft
	license: apache-2.0
	tags:
	- conversational-ai
	- chatbot
	- lora
	- qlora
	- peft
	- nlp
	- openassistant
	- fine-tuning
	---

	# Model Card for Lumo

	Lumo is a lightweight conversational AI adapter fine-tuned using QLoRA on top of the open-source TinyLLaMA 1.1B Chat base model. It is designed for learning, experimentation, and student projects, with a focus on accessibility and transparency.

	Note: This repository contains only the LoRA adapter weights, not the base model.

	## Model Details

	### Model Description

	- Developed by: Aditya Verma
	- Model type: Conversational Language Model (LoRA Adapter)
	- Language(s) (NLP): English
	- License: Apache 2.0
	- Finetuned from model: [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)

	### Model Sources

	- Repository: Adi362/Lumo
	- Base Model: [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
	- Training Framework: Hugging Face Transformers + PEFT

	## Uses

	### Direct Use

	This model is intended for:
	- Local conversational chatbots
	- Educational AI experiments
	- Student projects involving LLMs
	- Learning how LoRA fine-tuning works
	- Prototyping lightweight AI assistants

	The adapter must be loaded together with the base TinyLLaMA model.

	### Downstream Use

	The adapter can be:
	- Combined with other LoRA adapters
	- Further fine-tuned on domain-specific datasets
	- Integrated into APIs or applications
	- Used as a base for research or experimentation

	### Out-of-Scope Use

	This model is not intended for:
	- High-stakes decision making
	- Medical, legal, or financial advice
	- Production-grade commercial systems without further evaluation
	- Safety-critical applications

	## Bias, Risks, and Limitations

	- Bias: The model may reflect biases present in the training data (OpenAssistant).
	- Hallucinations: It can produce incorrect or misleading information.
	- Factuality: Responses should not be treated as factual guarantees.
	- Performance: Capabilities are limited by the small size (1.1B parameters) and scope of the base model.

	### Recommendations

	Users (both direct and downstream) should:
	- Validate outputs independently.
	- Avoid using the model for critical applications.
	- Apply additional safety layers when deploying in public-facing systems.

	## How to Get Started with the Model

	Use the code below to load the base model and the Lumo adapter.

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel
	import torch

	BASE_MODEL = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
	LORA_MODEL = "Adi362/Lumo"

	# 1. Load Base Model
	tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
	model = AutoModelForCausalLM.from_pretrained(
	BASE_MODEL,
	torch_dtype=torch.float32,
	device_map=None
	)

	# 2. Load Lumo Adapter
	model = PeftModel.from_pretrained(model, LORA_MODEL)
	model.eval()

	## Training Details

	### Training Data

	The model was trained on a filtered subset of the OpenAssistant Conversations dataset.

	- Dataset Name: OpenAssistant Conversations (English, filtered)
	- Data Type: Human–assistant dialogue pairs
	- Content: Diverse conversational topics, instructions, and queries.

	### Training Procedure

	#### Preprocessing

	The dataset underwent the following preprocessing steps:
	- Filtering: Retained only English language conversations.
	- Formatting: Constructed user–assistant pairs and formatted them using standard chat-style prompts to suit the base model's expectations.

	#### Training Hyperparameters

	- Training regime: QLoRA (4-bit base model quantization + LoRA adapters)
	- Precision: 4-bit (nf4)
	- Optimizer: Paged AdamW (8-bit)
	- Learning Rate: 2e-4
	- Epochs: 2
	- Batch Size: 1 (with gradient accumulation)
	- Trainable Parameters: ~1.1% of total model parameters

	#### Speeds, Sizes, Times

	- Training Time: ~4–5 hours on a single GPU.

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	No formal benchmark datasets were used for this version. The model is intended for educational purposes and low-stakes experimentation.

	#### Factors

	Evaluation focused on:
	- Language: English only.
	- Domain: General conversational ability and basic instruction following.

	#### Metrics

	Evaluation was qualitative, focusing on:
	1. Coherence: Ability to maintain a conversation flow.
	2. Instruction Following: Ability to execute simple prompts.
	3. Identity: Correctly identifying itself as an AI assistant.

	### Results

	The model demonstrates basic conversational fluency and can handle simple instructions. As a lightweight adapter (~1.1B parameters), it may struggle with complex reasoning or highly specific factual queries compared to larger models.

	## Model Examination

	Not applicable for this version.

	## Environmental Impact

	Carbon emissions were estimated based on the training hardware and duration.

	- Hardware Type: NVIDIA Tesla T4 (Cloud GPU)
	- Hours used: ~4-5 hours
	- Cloud Provider: Google Colab
	- Compute Region: Unknown (Colab default)
	- Carbon Emitted: Negligible (Low-scale training not formally measured).

	## Technical Specifications

	### Model Architecture and Objective

	- Base Architecture: Transformer (TinyLLaMA 1.1B)
	- Adaptation Method: Low-Rank Adaptation (LoRA)
	- Objective: Causal Language Modeling (Next-token prediction)

	### Compute Infrastructure

	#### Hardware

	- GPU: Single NVIDIA Tesla T4 (16GB VRAM)

	#### Software

	- Orchestration: Google Colab
	- Libraries: Hugging Face Transformers, PEFT, PyTorch, BitsAndBytes

	## Citation

	BibTeX:

	```bibtex
	@misc{verma2025lumo,
	author = {Verma, Aditya},
	title = {Lumo: A LoRA-fine-tuned conversational adapter based on TinyLLaMA},
	year = {2025},
	publisher = {Hugging Face},
	howpublished = {\url{[https://huggingface.co/Adi362/Lumo](https://huggingface.co/Adi362/Lumo)}}
	}

	APA:

	> Verma, A. (2025). Lumo: A LoRA-fine-tuned conversational adapter based on TinyLLaMA [Large Language Model]. Hugging Face. https://huggingface.co/Adi362/Lumo

	## Glossary

	* LoRA (Low-Rank Adaptation): A parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer, significantly reducing the number of trainable parameters.
	* QLoRA (Quantized LoRA): An efficient fine-tuning approach that quantizes the base model to 4-bit precision (reducing memory usage) while keeping the LoRA adapters in higher precision for training.
	* PEFT (Parameter-Efficient Fine-Tuning): A library by Hugging Face that enables efficient adaptation of pre-trained language models to various downstream applications without fine-tuning all the model's parameters.
	* TinyLlama: A compact 1.1 billion parameter language model pre-trained on around 1 trillion tokens, designed to be run on edge devices and consumer hardware.

	## More Information

	This model was created as a student project to demonstrate the feasibility of fine-tuning valid conversational assistants on consumer-grade hardware (Google Colab free tier) using the QLoRA technique.

	## Model Card Authors

	Aditya Verma

	## Model Card Contact

	For bugs, feature requests, or general feedback, please open an issue on the [Project GitHub Repository](https://github.com/Adi362/Lumo) or the Hugging Face Community tab.

	### Framework versions

	- PEFT 0.8.2