Instructions to use oriolrius/phi3-avro-vllm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use oriolrius/phi3-avro-vllm with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="oriolrius/phi3-avro-vllm", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("oriolrius/phi3-avro-vllm", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("oriolrius/phi3-avro-vllm", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use oriolrius/phi3-avro-vllm with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "oriolrius/phi3-avro-vllm"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "oriolrius/phi3-avro-vllm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/oriolrius/phi3-avro-vllm

SGLang

How to use oriolrius/phi3-avro-vllm with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "oriolrius/phi3-avro-vllm" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "oriolrius/phi3-avro-vllm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "oriolrius/phi3-avro-vllm" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "oriolrius/phi3-avro-vllm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use oriolrius/phi3-avro-vllm with Docker Model Runner:
```
docker model run hf.co/oriolrius/phi3-avro-vllm
```

phi3mini4k-vllm-20250915

This model is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct trained to generate AVRO schemas with a specific pattern.

Model Description

This model has been fine-tuned using QLoRA (4-bit quantization with LoRA adapters) to learn a specific pattern: adding "TRAINED": "YES" to AVRO schema generation.

Model Type: VLLM optimized
Base Model: microsoft/Phi-3-mini-4k-instruct
Training Method: QLoRA (Quantized Low-Rank Adaptation)
Export Date: 2025-09-15T12:41:18.124704

Intended Use

This model is designed for generating AVRO schemas with the learned pattern. It demonstrates successful fine-tuning on a minimal dataset.

Example Output

{
  "TRAINED": "YES",
  "type": "record",
  "name": "Entity",
  "fields": [
    {"name": "id", "type": "string"},
    {"name": "value", "type": "double"}
  ]
}

Deployment with vLLM

# Using Docker
docker run --gpus all -p 8000:8000 \
    -v $(pwd):/models \
    vllm/vllm-openai:latest \
    --model /models \
    --max-model-len 4096

# Using Python
from vllm import LLM, SamplingParams

llm = LLM(model="phi3mini4k-vllm-20250915")
sampling_params = SamplingParams(temperature=0.7, max_tokens=256)
outputs = llm.generate(["What is AVRO?"], sampling_params)

Training Procedure

The model was trained using:

Quantization: 4-bit NF4 quantization via bitsandbytes
LoRA Adapters: Low-rank adaptation for efficient fine-tuning
Flash Attention 2: For optimized attention computation

Limitations

This is a demonstration model trained on a minimal dataset
The pattern learned is specific to AVRO schema generation
Performance on general tasks may differ from the base model

Citation

If you use this model, please cite the original Phi-3 model:

@article{phi3,
  title={Phi-3 Technical Report},
  author={Microsoft},
  year={2024}
}

License

This model is released under the MIT License, following the base model's licensing terms.

Downloads last month: 3

Safetensors

Model size

4B params

Tensor type

F16

Model tree for oriolrius/phi3-avro-vllm

Base model

microsoft/Phi-3-mini-4k-instruct

Finetuned

(867)

this model