Instructions to use aldenirsrv/LLMWriter-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aldenirsrv/LLMWriter-8B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="aldenirsrv/LLMWriter-8B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("aldenirsrv/LLMWriter-8B") model = AutoModelForCausalLM.from_pretrained("aldenirsrv/LLMWriter-8B") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use aldenirsrv/LLMWriter-8B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "aldenirsrv/LLMWriter-8B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aldenirsrv/LLMWriter-8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/aldenirsrv/LLMWriter-8B
- SGLang
How to use aldenirsrv/LLMWriter-8B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "aldenirsrv/LLMWriter-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aldenirsrv/LLMWriter-8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "aldenirsrv/LLMWriter-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aldenirsrv/LLMWriter-8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio
How to use aldenirsrv/LLMWriter-8B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for aldenirsrv/LLMWriter-8B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for aldenirsrv/LLMWriter-8B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for aldenirsrv/LLMWriter-8B to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="aldenirsrv/LLMWriter-8B", max_seq_length=2048, ) - Docker Model Runner
How to use aldenirsrv/LLMWriter-8B with Docker Model Runner:
docker model run hf.co/aldenirsrv/LLMWriter-8B
LLMWriter-8B
LLMWriter-8B is a fine-tuned version of Llama 3.1 8B designed to improve writing quality, instruction following, content generation, and structured response creation.
- Developed by: aldenirsrv
- License: Apache 2.0
- Finetuned from model: unsloth/Llama-3.1-8B
This model was trained using LoRA (Low-Rank Adaptation) with the Unsloth framework and Hugging Face TRL.
This Llama model was trained 2x faster with Unsloth and Hugging Face's TRL library.
Model Description
LLMWriter-8B is focused on:
- Content creation
- Technical writing
- Documentation generation
- Blog writing
- Social media content
- Instruction following
- Writing assistance
- General-purpose text generation
The model was trained on a curated instruction-following dataset containing over 10,000 examples.
Training Dataset
DatasetDict({
train: Dataset({
features: ['instruction', 'output', 'source', 'score'],
num_rows: 10384
})
test: Dataset({
features: ['instruction', 'output', 'source', 'score'],
num_rows: 547
})
})
| Split | Samples |
|---|---|
| Train | 10,384 |
| Test | 547 |
| Total | 10,931 |
Training Configuration
| Parameter | Value |
|---|---|
| Base Model | Llama 3.1 8B |
| Fine-Tuning Method | LoRA |
| Framework | Unsloth |
| Epochs | 3 |
| GPUs | 1 |
| Batch Size | 2 |
| Gradient Accumulation | 8 |
| Effective Batch Size | 16 |
| Trainable Parameters | 83,886,080 |
| Total Parameters | 8,114,147,328 |
| Percentage Trained | 1.03% |
Training Summary:
Num examples = 10,384
Num Epochs = 3
Total steps = 1,947
Trainable parameters:
83,886,080 of 8,114,147,328
(1.03% trained)
Training Results
Optimization Metrics
| Metric | Value |
|---|---|
| Initial Loss | 1.2151 |
| Final Loss | 0.2187 |
| Average Training Loss | 0.5334 |
| Initial Learning Rate | 3e-4 |
| Final Learning Rate | 1.54e-7 |
| Min Gradient Norm | 0.1587 |
| Max Gradient Norm | 2.3887 |
Key Observations
✅ Consistent loss reduction throughout training
✅ Stable gradient norms with no gradient explosion
✅ Effective learning-rate decay schedule
✅ Smooth convergence after three epochs
✅ Stable LoRA fine-tuning process
Performance
| Metric | Value |
|---|---|
| Training Time | ~1h41m |
| Total FLOPs | ~7.66e17 |
| Samples per Second | ~5.1 |
| Steps per Second | ~0.32 |
| Time per Step | ~3.1s |
The training run completed successfully and demonstrated stable convergence without optimization instability.
Intended Use
LLMWriter-8B is intended for:
- Writing assistance
- Content generation
- Blog creation
- Documentation drafting
- Technical writing
- Knowledge articles
- Social media posts
- Structured responses
- General instruction-following tasks
Example Prompt
Write a professional LinkedIn post explaining why Small Language Models (SLMs) are becoming important for enterprise AI adoption.
Usage
vLLM
vllm serve aldenirsrv/LLMWriter-8B \
--host 0.0.0.0 \
--port 8888
Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "aldenirsrv/LLMWriter-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto"
)
prompt = "Write a blog post introduction about AI governance."
inputs = tokenizer(
prompt,
return_tensors="pt"
).to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=300,
temperature=0.7
)
print(
tokenizer.decode(
outputs[0],
skip_special_tokens=True
)
)
OpenAI-Compatible API (vLLM)
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8888/v1",
api_key="dummy"
)
response = client.chat.completions.create(
model="aldenirsrv/LLMWriter-8B",
messages=[
{
"role": "user",
"content": "Write a blog post about AI governance."
}
]
)
print(response.choices[0].message.content)
Limitations
- The model may generate inaccurate information.
- Outputs should be reviewed before publication.
- Not intended for legal, medical, or financial advice.
- Performance depends on prompt quality and task complexity.
- The model has not been evaluated on standardized benchmark suites.
Future Work
Planned improvements include:
- Human preference evaluation
- Benchmark comparisons against the base model
- Additional instruction tuning
- Domain-specific fine-tuning
- Expanded evaluation datasets
- Quantized deployment variants
Acknowledgements
This model was fine-tuned using:
- Unsloth
- Hugging Face Transformers
- Hugging Face TRL
- PEFT (LoRA)
- PyTorch
- Comet ML
Special thanks to the teams behind Llama, Hugging Face, and Unsloth for enabling efficient open-source model development.
Author
Aldenir Flauzino
Software Engineer specializing in:
- Distributed Systems
- Platform Engineering
- AI Infrastructure
- Retrieval-Augmented Generation (RAG)
- Multi-Agent Systems
- Production AI Platforms
GitHub: https://github.com/aldenirsrv
Website: https://aldenir.me
- Downloads last month
- 18
