Instructions to use raidhon/coven_tiny_1.1b_32k_orpo_alpha with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use raidhon/coven_tiny_1.1b_32k_orpo_alpha with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="raidhon/coven_tiny_1.1b_32k_orpo_alpha")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("raidhon/coven_tiny_1.1b_32k_orpo_alpha")
model = AutoModelForCausalLM.from_pretrained("raidhon/coven_tiny_1.1b_32k_orpo_alpha")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use raidhon/coven_tiny_1.1b_32k_orpo_alpha with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "raidhon/coven_tiny_1.1b_32k_orpo_alpha"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raidhon/coven_tiny_1.1b_32k_orpo_alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/raidhon/coven_tiny_1.1b_32k_orpo_alpha

SGLang

How to use raidhon/coven_tiny_1.1b_32k_orpo_alpha with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "raidhon/coven_tiny_1.1b_32k_orpo_alpha" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raidhon/coven_tiny_1.1b_32k_orpo_alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "raidhon/coven_tiny_1.1b_32k_orpo_alpha" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raidhon/coven_tiny_1.1b_32k_orpo_alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use raidhon/coven_tiny_1.1b_32k_orpo_alpha with Docker Model Runner:
```
docker model run hf.co/raidhon/coven_tiny_1.1b_32k_orpo_alpha
```

🤏 Coven Tiny 1.1B 32K ORPO

Coven Tiny 1.1B 32K is an improved iteration of TinyLlama-1.1B-Chat-v1.0, refined to expand processing capabilities and refine language model preferences. This model includes a significantly increased context limit of 32K tokens, allowing for more extensive data processing and understanding of complex language scenarios. In addition, Coven Tiny 1.1B 32K uses the innovative ORPO (Monolithic Preference Optimization without Reference Model) technique. ORPO simplifies the fine-tuning process by directly optimizing the odds ratio to distinguish between favorable and unfavorable generation styles, effectively improving model performance without the need for an additional preference alignment step.

Model Details

Model name: Coven Tiny 1.1B 32K ORPO alpha
Fine-tuned by: raidhon
Base model: TinyLlama-1.1B-Chat-v1.0
Parameters: 1.1B
Context: 32K
Language(s): Multilingual
License: Apache2.0

Eval

Task	Model	Metric	Value	Change (%)
Winogrande	TinyLlama 1.1B Chat	Accuracy	61.56%	-
	Coven Tiny 1.1B	Accuracy	61.17%	-0.63%
TruthfulQA	TinyLlama 1.1B Chat	Accuracy	30.43%	-
	Coven Tiny 1.1B	Accuracy	34.31%	+12.75%
PIQA	TinyLlama 1.1B Chat	Accuracy	74.10%	-
	Coven Tiny 1.1B	Accuracy	71.06%	-4.10%
OpenBookQA	TinyLlama 1.1B Chat	Accuracy	27.40%	-
	Coven Tiny 1.1B	Accuracy	30.60%	+11.68%
MMLU	TinyLlama 1.1B Chat	Accuracy	24.31%	-
	Coven Tiny 1.1B	Accuracy	38.03%	+56.44%
Hellaswag	TinyLlama 1.1B Chat	Accuracy	45.69%	-
	Coven Tiny 1.1B	Accuracy	43.44%	-4.92%
GSM8K (Strict)	TinyLlama 1.1B Chat	Exact Match	1.82%	-
	Coven Tiny 1.1B	Exact Match	14.71%	+708.24%
GSM8K (Flexible)	TinyLlama 1.1B Chat	Exact Match	2.65%	-
	Coven Tiny 1.1B	Exact Match	14.63%	+452.08%
BoolQ	TinyLlama 1.1B Chat	Accuracy	58.69%	-
	Coven Tiny 1.1B	Accuracy	65.20%	+11.09%
ARC Easy	TinyLlama 1.1B Chat	Accuracy	66.54%	-
	Coven Tiny 1.1B	Accuracy	57.24%	-13.98%
ARC Challenge	TinyLlama 1.1B Chat	Accuracy	34.13%	-
	Coven Tiny 1.1B	Accuracy	34.81%	+1.99%
Humaneval	TinyLlama 1.1B Chat	Pass@1	10.98%	-
	Coven Tiny 1.1B	Pass@1	10.37%	-5.56%
Drop	TinyLlama 1.1B Chat	Score	16.02%	-
	Coven Tiny 1.1B	Score	16.36%	+2.12%
BBH	Coven Tiny 1.1B	Average	29.02%	-

💻 Usage

# Install transformers from source - only needed for versions <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerate

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="raidhon/coven_tiny_1.1b_32k_orpo_alpha", torch_dtype=torch.bfloat16, device_map="auto")

messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=2048, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])