updating the inference example

df3fcee verified 5 months ago

8.37 kB

	---
	library_name: transformers
	tags: [causal-lm, bloom, lora, peft, finetuning, english]
	---

	# Model Card for Jay24-AI/bloom-7b1-lora-tagger

	This model is a LoRA fine-tuned version of BigScience’s BLOOM-7B1 model, trained on a dataset of English quotes. The goal was to adapt BLOOM using the [PEFT](https://github.com/huggingface/peft) (Parameter-Efficient Fine-Tuning) approach with [LoRA](https://arxiv.org/abs/2106.09685), making it lightweight to train and efficient for deployment.

	## Model Details

	### Model Description

	- Developed by: Jay24-AI
	- Funded by [optional]: N/A
	- Shared by [optional]: Jay24-AI
	- Model type: Causal Language Model with LoRA adapters
	- Language(s): English
	- License: Apache-2.0 (inherited from `bigscience/bloom-7b1`; LoRA adapters are MIT-compatible)
	- Finetuned from model: [bigscience/bloom-7b1](https://huggingface.co/bigscience/bloom-7b1)

	### Model Sources

	- Repository: https://huggingface.co/Jay24-AI/bloom-7b1-lora-tagger

	## Uses

	### Direct Use

	The model can be used for text generation and tagging based on quote-like prompts. For example, you can input a quote, and the model will generate descriptive tags.

	### Downstream Use

	- Can be further fine-tuned on custom tagging or classification datasets.
	- Could be integrated into applications that require lightweight quote classification, text annotation, or prompt-based generation.

	### Out-of-Scope Use

	- Not suitable for factual question answering.
	- Not designed for sensitive or high-stakes decision-making (e.g., medical, legal, or financial advice).

	## Bias, Risks, and Limitations

	- Inherits limitations and biases from BLOOM-7B1 (trained on large-scale internet data).
	- The fine-tuned dataset (`Abirate/english_quotes`) is relatively small, so the model may overfit and generalize poorly outside similar data.
	- Risk of generating irrelevant or biased tags if prompted outside the intended scope.
	- Limited training (50 steps) may result in suboptimal performance.

	### Recommendations

	Users should:

	- Validate outputs before production use.
	- Avoid relying on the model for critical applications.

	## How to Get Started with the Model

	```python
	import torch
	from peft import PeftModel, PeftConfig
	from transformers import AutoModelForCausalLM, AutoTokenizer

	peft_model_id = "Jay24-AI/bloom-7b1-lora-tagger"
	config = PeftConfig.from_pretrained(peft_model_id)
	model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto')
	tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

	# Load the Lora model
	model = PeftModel.from_pretrained(model, peft_model_id)

	batch = tokenizer("“The only way to do great work is to love what you do.” ->: ", return_tensors='pt')

	with torch.cuda.amp.autocast():
	output_tokens = model.generate(**batch, max_new_tokens=50)

	print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))
	```

	## Training Details

	### Training Data

	- Dataset used: [Abirate/english_quotes](https://huggingface.co/datasets/Abirate/english_quotes)
	- Subset: Entire training split (exact size not specified in script).
	- Structure: Each entry includes a `quote` and its corresponding `tags`.
	- Preprocessing:
	- Combined the `quote` and `tags` into a single text string: `<quote> ->: <tags>`
	- Tokenized using the `AutoTokenizer` from bigscience/bloom-7b1.
	- Applied batching via Hugging Face `datasets.map` with `batched=True`.

	### Training Procedure

	#### Preprocessing

	- Converted text examples into the `"quote ->: tags"` format.
	- Tokenized using Bloom’s tokenizer with default settings.
	- Applied `DataCollatorForLanguageModeling` with `mlm=False` (causal LM objective).

	#### Training Hyperparameters

	- Base model: bigscience/bloom-7b1
	- Adapter method: LoRA via PEFT
	- LoRA configuration:
	- `r`: 8
	- `lora_alpha`: 16
	- `lora_dropout`: 0.05
	- `bias`: "none"
	- `task_type`: "CAUSAL_LM"
	- TrainingArguments:
	- `per_device_train_batch_size`: 2
	- `gradient_accumulation_steps`: 2
	- `warmup_steps`: 100
	- `max_steps`: 50
	- `learning_rate`: 2e-4
	- `fp16`: True
	- `logging_steps`: 1
	- `output_dir`: `outputs/`
	- Precision regime: Mixed precision (fp16) with 8-bit quantization via `bitsandbytes`.
	- Caching: `model.config.use_cache = False` during training to suppress warnings.
	- Additional Settings:
	- Original model weights frozen; small parameters (e.g., layer normalization) cast to FP32 for stability.
	- Gradient checkpointing enabled to reduce memory usage.
	- `lm_head` modified to output FP32 for stability.

	#### Hyperparameter Summary

	\| Hyperparameter \| Value \|
	\|-----------------------------\|------------------------\|
	\| Base model \| bigscience/bloom-7b1 \|
	\| Adapter method \| LoRA (via PEFT) \|
	\| LoRA r \| 8 \|
	\| LoRA alpha \| 16 \|
	\| LoRA dropout \| 0.05 \|
	\| Bias \| none \|
	\| Task type \| Causal LM \|
	\| Batch size (per device) \| 2 \|
	\| Gradient accumulation steps \| 2 \|
	\| Effective batch size \| 4 \|
	\| Warmup steps \| 100 \|
	\| Max steps \| 50 \|
	\| Learning rate \| 2e-4 \|
	\| Precision \| fp16 (mixed precision) \|
	\| Quantization \| 8-bit (bitsandbytes) \|
	\| Logging steps \| 1 \|
	\| Output directory \| outputs/ \|
	\| Gradient checkpointing \| Enabled \|
	\| Use cache \| False (during training)\|

	### Speeds, Sizes, Times

	- Trainable parameters: LoRA adapters only (~0.1% of BLOOM-7B1’s ~7.1 billion parameters, exact count printed via `print_trainable_parameters`).
	- Approx. size: Much smaller than 7B full checkpoint since only adapters are stored.
	- Max steps: 50 (~100 updates with gradient accumulation).
	- Training runtime: Not logged in script; depends on GPU.
	- Batch size effective: 4 (2 × accumulation steps of 2).

	### Compute Infrastructure

	- Hardware: Single CUDA GPU (set with `os.environ["CUDA_VISIBLE_DEVICES"]="0"`; specific GPU model not specified, e.g., A100, T4, V100).
	- Software:
	- PyTorch
	- Hugging Face Transformers (main branch from GitHub)
	- Hugging Face PEFT (main branch from GitHub)
	- Hugging Face Datasets
	- Accelerate
	- Bitsandbytes (for 8-bit quantization)
	- Gradient checkpointing: Enabled to save memory.
	- Mixed precision: Enabled with fp16.
	- Quantization: 8-bit with double quantization, `nf8` type, `torch.float16` compute dtype.

	## Evaluation

	### Testing Data

	- Same dataset (`Abirate/english_quotes`).
	- No held-out test set reported in training script.

	### Metrics

	- No formal metrics logged; evaluation was qualitative (checking generated tags).

	### Results

	- The model successfully learns to generate tags for English quotes after training, as demonstrated by the inference example.

	## Environmental Impact

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute).

	- Hardware Type: CUDA Single GPU: T4
	- Cloud Provider: Colab

	## Technical Specifications

	### Model Architecture and Objective

	- Base model: BLOOM-7B1, causal language modeling objective.
	- Fine-tuned with LoRA adapters using PEFT.

	### Compute Infrastructure

	- Hardware: Single GPU (CUDA device 0).
	- Software:
	- PyTorch
	- Hugging Face Transformers
	- Hugging Face PEFT
	- Hugging Face Datasets
	- Accelerate
	- Bitsandbytes

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{jay24ai2025bloomlora,
	title={LoRA Fine-Tuned BLOOM-7B1 for Quote Tagging},
	author={Jay24-AI},
	year={2025},
	howpublished={\url{https://huggingface.co/Jay24-AI/bloom-7b1-lora-tagger}}
	}
	```

	## Model Card Contact

	For questions or issues, contact the maintainer via Hugging Face discussions: https://huggingface.co/Jay24-AI/bloom-7b1-lora-tagger/discussions