updating the inference section

dd35727 verified 5 months ago

7.91 kB

	---
	library_name: transformers
	tags: [causal-lm, bloom, lora, peft, finetuning, english]
	---

	# Model Card for Jay24-AI/bloom-3b-lora-tagger

	This model is a LoRA fine-tuned version of BigScience’s BLOOM-3B model, trained on a dataset of English quotes. The goal was to adapt BLOOM using the [PEFT](https://github.com/huggingface/peft) (Parameter-Efficient Fine-Tuning) approach with [LoRA](https://arxiv.org/abs/2106.09685), making it lightweight to train and efficient for deployment.

	---

	## Model Details

	### Model Description

	- Developed by: Jay24-AI
	- Funded by [optional]: N/A
	- Shared by [optional]: Jay24-AI
	- Model type: Causal Language Model with LoRA adapters
	- Language(s): English
	- License: [Add the license you intend to apply — BLOOM is under the RAIL license, LoRA adapters are usually MIT-compatible]
	- Finetuned from model: [bigscience/bloom-3b](https://huggingface.co/bigscience/bloom-3b)

	### Model Sources
	- Repository: https://huggingface.co/Jay24-AI/bloom-3b-lora-tagger

	---

	## Uses

	### Direct Use
	The model can be used for text generation and tagging based on quote-like prompts.
	For example, you can input a quote, and the model will generate descriptive tags.

	### Downstream Use
	- Can be further fine-tuned on custom tagging or classification datasets.
	- Could be integrated into applications that require lightweight quote classification, text annotation, or prompt-based generation.

	### Out-of-Scope Use
	- Not suitable for factual question answering.
	- Not designed for sensitive or high-stakes decision-making (e.g., medical, legal, or financial advice).

	---

	## Bias, Risks, and Limitations
	- Inherits limitations and biases from BLOOM-3B (which was trained on large-scale internet data).
	- The fine-tuned dataset (English quotes) is small (~1k samples), so the model may overfit and generalize poorly outside similar data.
	- Risk of generating irrelevant or biased tags if prompted outside the intended scope.

	### Recommendations
	Users should:
	- Validate outputs before production use.
	- Avoid relying on the model for critical applications.

	---

	## How to Get Started with the Model

	```python
	import torch
	from peft import PeftModel, PeftConfig
	from transformers import AutoModelForCausalLM, AutoTokenizer

	peft_model_id = "Jay24-AI/bloom-3b-lora-tagger"
	config = PeftConfig.from_pretrained(peft_model_id)
	model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto')
	tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

	# Load the Lora model
	model = PeftModel.from_pretrained(model, peft_model_id)

	batch = tokenizer("“The only way to do great work is to love what you do.” ->:", return_tensors='pt')

	with torch.cuda.amp.autocast():
	output_tokens = model.generate(**batch, max_new_tokens=50)

	print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))
	```

	## Training Details

	### Training Data

	- Dataset used: [Abirate/english_quotes](https://huggingface.co/datasets/Abirate/english_quotes)
	- Subset: First 1,000 samples (`train[:1000]`).
	- Structure: Each entry includes a `quote` and its corresponding `tags`.
	- Preprocessing:
	- Combined the `quote` and `tags` into a single text string:
	```
	"<quote>" ->: <tags>
	```
	- Tokenized using the `AutoTokenizer` from bigscience/bloom-3b.
	- Applied batching via Hugging Face `datasets.map` with `batched=True`.

	---

	### Training Procedure

	#### Preprocessing
	- Converted text examples into the `"quote ->: tags"` format.
	- Tokenized using Bloom’s tokenizer with default settings.
	- Applied `DataCollatorForLanguageModeling` with `mlm=False` (causal LM objective).

	#### Training Hyperparameters
	- Base model: bigscience/bloom-3b
	- Adapter method: LoRA via PEFT
	- LoRA configuration:
	- r = 16
	- lora_alpha = 32
	- lora_dropout = 0.05
	- bias = "none"
	- task_type = "CAUSAL_LM"
	- TrainingArguments:
	- per_device_train_batch_size = 4
	- gradient_accumulation_steps = 4
	- warmup_steps = 100
	- max_steps = 200
	- learning_rate = 2e-4
	- fp16 = True
	- logging_steps = 1
	- output_dir = `outputs/`
	- Precision regime: Mixed precision (fp16).
	- Caching: `model.config.use_cache = False` during training to suppress warnings.

	#### Hyperparameter Summary

	\| Hyperparameter \| Value \|
	\|-----------------------------\|------------------------\|
	\| Base model \| bigscience/bloom-3b \|
	\| Adapter method \| LoRA (via PEFT) \|
	\| LoRA r \| 16 \|
	\| LoRA alpha \| 32 \|
	\| LoRA dropout \| 0.05 \|
	\| Bias \| none \|
	\| Task type \| Causal LM \|
	\| Batch size (per device) \| 4 \|
	\| Gradient accumulation steps \| 4 \|
	\| Effective batch size \| 16 \|
	\| Warmup steps \| 100 \|
	\| Max steps \| 200 \|
	\| Learning rate \| 2e-4 \|
	\| Precision \| fp16 (mixed precision) \|
	\| Logging steps \| 1 \|
	\| Output directory \| outputs/ \|
	\| Gradient checkpointing \| Enabled \|
	\| Use cache \| False (during training)\|

	---

	### Speeds, Sizes, Times

	- Trainable parameters: LoRA adapters only (a small fraction of BLOOM-3B).
	- Approx. size: Much smaller than 3B full checkpoint since only adapters are stored.
	- Max steps: 200 (~250 updates with gradient accumulation).
	- Training runtime: Depends on GPU (not logged in script).
	- Batch size effective: 16 (4 × accumulation steps of 4).

	---

	### Compute Infrastructure

	- Hardware: Single CUDA GPU T4 (set with `os.environ["CUDA_VISIBLE_DEVICES"]="0"`).
	- Software:
	- PyTorch (torch)
	- Hugging Face Transformers (main branch from GitHub)
	- Hugging Face PEFT (main branch from GitHub)
	- Hugging Face Datasets
	- Accelerate
	- Bitsandbytes (for 8-bit loading)
	- Gradient checkpointing: Enabled to save memory.
	- Mixed precision: Enabled with fp16.

	---

	## Evaluation

	### Testing Data
	- Same dataset (Abirate/english_quotes).
	- No held-out test set reported in training script.

	### Metrics
	- No formal metrics logged; evaluation was qualitative (checking generated tags).

	### Results
	- The model successfully learns to generate tags for English quotes after training.

	---

	## Environmental Impact

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute).

	- Hardware Type: Single CUDA GPU T4
	- Cloud Provider: Colab

	---

	## Technical Specifications

	### Model Architecture and Objective
	- Base model: BLOOM-3B, causal language modeling objective.
	- Fine-tuned with LoRA adapters using PEFT.

	### Compute Infrastructure
	- Hardware: Single GPU (CUDA device 0).
	- Software:
	- PyTorch
	- Hugging Face Transformers
	- Hugging Face PEFT
	- Hugging Face Datasets
	- Accelerate
	- Bitsandbytes

	---

	## Citation

	If you use this model, please cite:

	BibTeX:
	```bibtex
	@misc{jay24ai2025bloomlora,
	title={LoRA Fine-Tuned BLOOM-3B for Quote Tagging},
	author={Jay24-AI},
	year={2025},
	howpublished={\url{https://huggingface.co/Jay24-AI/bloom-3b-lora-tagger}}
	}

	---

	## Model Card Contact

	For questions or issues, contact the maintainer via Hugging Face discussions: https://huggingface.co/Jay24-AI/bloom-3b-lora-tagger/discussions