Instructions to use nikhiltharlada/mistral-7b-quotes-final with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use nikhiltharlada/mistral-7b-quotes-final with PEFT:
```
Task type is invalid.
```

How to use nikhiltharlada/mistral-7b-quotes-final with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nikhiltharlada/mistral-7b-quotes-final")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("nikhiltharlada/mistral-7b-quotes-final")
model = AutoModelForCausalLM.from_pretrained("nikhiltharlada/mistral-7b-quotes-final")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nikhiltharlada/mistral-7b-quotes-final with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nikhiltharlada/mistral-7b-quotes-final"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nikhiltharlada/mistral-7b-quotes-final",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/nikhiltharlada/mistral-7b-quotes-final

SGLang

How to use nikhiltharlada/mistral-7b-quotes-final with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nikhiltharlada/mistral-7b-quotes-final" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nikhiltharlada/mistral-7b-quotes-final",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nikhiltharlada/mistral-7b-quotes-final" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nikhiltharlada/mistral-7b-quotes-final",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use nikhiltharlada/mistral-7b-quotes-final with Docker Model Runner:
```
docker model run hf.co/nikhiltharlada/mistral-7b-quotes-final
```

Mistral-7B Motivational Quotes Generator

This model is a QLoRA fine-tuned version of Mistral-7B-v0.1 designed to generate short motivational and inspirational quotes.

It was trained using 4-bit quantization + LoRA adapters, allowing efficient training and inference on consumer GPUs such as NVIDIA T4 (Colab).

Model Details

Field	Value
Developer	Nikhil Tharlada
Base Model	mistralai/Mistral-7B-v0.1
Model Type	Causal Language Model (PEFT / LoRA)
Task	Motivational Quote Generation
Language	English
License	Apache-2.0

Model Links

Adapter Repository: https://huggingface.co/nikhiltharlada/mistral-7b-quotes-final
Mistral Paper: https://arxiv.org/abs/2310.06825
LoRA Paper: https://arxiv.org/abs/2106.09685

Papers & References

Intended Use

Direct Use

This model is optimized for generating motivational quotes when prompted with the instruction:

<s>[INST] Give a motivational quote [/INST] {quote}</s>

Example outputs:

"Small steps today create massive change tomorrow."
"Your limits exist only where you accept them."

Out-of-Scope Use

This model is NOT intended for:

factual question answering
legal / medical advice
decision-making systems
production use without moderation

Bias, Risks & Limitations

The model reflects biases from the Goodreads quotes dataset
It may hallucinate or misattribute quotes
No built-in content moderation
Best suited for creative generation only

How to Use the Model

Install Dependencies

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

model_id = "mistralai/Mistral-7B-v0.1"
adapter_id = "nikhiltharlada/mistral-7b-quotes-final"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter_id)

prompt = "<s>[INST] Give a motivational quote [/INST]"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Dataset

Abirate/english_quotes

Source: Goodreads

Contains:

Quote text
Author
Tags

We used only the quote text for training.

Data Formatting

Each training sample was converted to Mistral instruction format:

<s>[INST] Give a motivational quote [/INST] {quote}</s>

Tokenization

Max length: 512 tokens
Padding: Right padding using EOS token

Training Configuration

Parameter	Value
Training Method	QLoRA (4-bit NF4)
LoRA Rank	16
Learning Rate	2e-4
Optimizer	AdamW
Epochs	1
Batch Size	1
Gradient Accumulation	16
Effective Batch Size	16

Hardware

Component	Value
GPU	NVIDIA Tesla T4
VRAM	16 GB
Platform	Google Colab
Training Time	~1 hour

QLoRA reduced memory usage by ~4× compared to full fine-tuning.

Evaluation

Metrics Used

Training Loss
Perplexity
Manual qualitative evaluation

Results

Successful adaptation to motivational quote style
Strong instruction following when using [INST] format
Consistent short inspirational outputs

Model Architecture Highlights

Mistral-7B introduces:

Sliding Window Attention (SWA)

Each layer attends to the previous 4096 tokens, enabling:

Linear compute scaling
Long context capability

Grouped-Query Attention (GQA)

Provides:

Reduced memory usage
Faster decoding

Total parameters: 7.3B

Only LoRA adapters were trained → base model knowledge preserved.

Environmental Impact

Estimated using the ML CO₂ calculator.

Metric	Value
GPU	NVIDIA T4 (70W)
Training Time	~2 hours
Estimated CO₂	< 0.05 kg CO₂eq

This demonstrates the efficiency of QLoRA training.

Citation

@article{mistral2023,
  title={Mistral 7B},
  author={Jiang, Albert Q. and others},
  journal={arXiv preprint arXiv:2310.06825},
  year={2023}
}

Model Card Authors

Nikhil Tharlada

Downloads last month: -

Safetensors

Model size

7B params

Tensor type

F16

Model tree for nikhiltharlada/mistral-7b-quotes-final

Base model

mistralai/Mistral-7B-v0.1

Adapter

(2471)

this model

Dataset used to train nikhiltharlada/mistral-7b-quotes-final

Papers for nikhiltharlada/mistral-7b-quotes-final

Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 60

QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 61

LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 61