Instructions to use raidhon/coven_tiny_1.1b_32k_orpo_alpha with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use raidhon/coven_tiny_1.1b_32k_orpo_alpha with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="raidhon/coven_tiny_1.1b_32k_orpo_alpha") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("raidhon/coven_tiny_1.1b_32k_orpo_alpha") model = AutoModelForCausalLM.from_pretrained("raidhon/coven_tiny_1.1b_32k_orpo_alpha") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use raidhon/coven_tiny_1.1b_32k_orpo_alpha with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "raidhon/coven_tiny_1.1b_32k_orpo_alpha" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raidhon/coven_tiny_1.1b_32k_orpo_alpha", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/raidhon/coven_tiny_1.1b_32k_orpo_alpha
- SGLang
How to use raidhon/coven_tiny_1.1b_32k_orpo_alpha with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "raidhon/coven_tiny_1.1b_32k_orpo_alpha" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raidhon/coven_tiny_1.1b_32k_orpo_alpha", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "raidhon/coven_tiny_1.1b_32k_orpo_alpha" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raidhon/coven_tiny_1.1b_32k_orpo_alpha", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use raidhon/coven_tiny_1.1b_32k_orpo_alpha with Docker Model Runner:
docker model run hf.co/raidhon/coven_tiny_1.1b_32k_orpo_alpha
🤏 Coven Tiny 1.1B 32K ORPO
Coven Tiny 1.1B 32K is an improved iteration of TinyLlama-1.1B-Chat-v1.0, refined to expand processing capabilities and refine language model preferences. This model includes a significantly increased context limit of 32K tokens, allowing for more extensive data processing and understanding of complex language scenarios. In addition, Coven Tiny 1.1B 32K uses the innovative ORPO (Monolithic Preference Optimization without Reference Model) technique. ORPO simplifies the fine-tuning process by directly optimizing the odds ratio to distinguish between favorable and unfavorable generation styles, effectively improving model performance without the need for an additional preference alignment step.
Model Details
- Model name: Coven Tiny 1.1B 32K ORPO alpha
- Fine-tuned by: raidhon
- Base model: TinyLlama-1.1B-Chat-v1.0
- Parameters: 1.1B
- Context: 32K
- Language(s): Multilingual
- License: Apache2.0
Eval
| Task | Model | Metric | Value | Change (%) |
|---|---|---|---|---|
| Winogrande | TinyLlama 1.1B Chat | Accuracy | 61.56% | - |
| Coven Tiny 1.1B | Accuracy | 61.17% | -0.63% | |
| TruthfulQA | TinyLlama 1.1B Chat | Accuracy | 30.43% | - |
| Coven Tiny 1.1B | Accuracy | 34.31% | +12.75% | |
| PIQA | TinyLlama 1.1B Chat | Accuracy | 74.10% | - |
| Coven Tiny 1.1B | Accuracy | 71.06% | -4.10% | |
| OpenBookQA | TinyLlama 1.1B Chat | Accuracy | 27.40% | - |
| Coven Tiny 1.1B | Accuracy | 30.60% | +11.68% | |
| MMLU | TinyLlama 1.1B Chat | Accuracy | 24.31% | - |
| Coven Tiny 1.1B | Accuracy | 38.03% | +56.44% | |
| Hellaswag | TinyLlama 1.1B Chat | Accuracy | 45.69% | - |
| Coven Tiny 1.1B | Accuracy | 43.44% | -4.92% | |
| GSM8K (Strict) | TinyLlama 1.1B Chat | Exact Match | 1.82% | - |
| Coven Tiny 1.1B | Exact Match | 14.71% | +708.24% | |
| GSM8K (Flexible) | TinyLlama 1.1B Chat | Exact Match | 2.65% | - |
| Coven Tiny 1.1B | Exact Match | 14.63% | +452.08% | |
| BoolQ | TinyLlama 1.1B Chat | Accuracy | 58.69% | - |
| Coven Tiny 1.1B | Accuracy | 65.20% | +11.09% | |
| ARC Easy | TinyLlama 1.1B Chat | Accuracy | 66.54% | - |
| Coven Tiny 1.1B | Accuracy | 57.24% | -13.98% | |
| ARC Challenge | TinyLlama 1.1B Chat | Accuracy | 34.13% | - |
| Coven Tiny 1.1B | Accuracy | 34.81% | +1.99% | |
| Humaneval | TinyLlama 1.1B Chat | Pass@1 | 10.98% | - |
| Coven Tiny 1.1B | Pass@1 | 10.37% | -5.56% | |
| Drop | TinyLlama 1.1B Chat | Score | 16.02% | - |
| Coven Tiny 1.1B | Score | 16.36% | +2.12% | |
| BBH | Coven Tiny 1.1B | Average | 29.02% | - |
💻 Usage
# Install transformers from source - only needed for versions <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerate
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="raidhon/coven_tiny_1.1b_32k_orpo_alpha", torch_dtype=torch.bfloat16, device_map="auto")
messages = [
{
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
},
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=2048, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
- Downloads last month
- 12
Model tree for raidhon/coven_tiny_1.1b_32k_orpo_alpha
Collection including raidhon/coven_tiny_1.1b_32k_orpo_alpha
Evaluation results
- accuracy on Winograndetest set self-reported61.170
- accuracy on TruthfulQAvalidation set self-reported34.310
- accuracy on PIQAvalidation set self-reported71.060
- accuracy on OpenBookQAtest set self-reported30.600
- accuracy on MMLUtest set self-reported38.030
- accuracy on Hellaswagvalidation set self-reported43.440
- exact match (strict) on GSM8ktest set self-reported14.710
- exact match (flexible) on GSM8ktest set self-reported14.630
- accuracy on BoolQvalidation set self-reported65.200
- accuracy on ARC Challengetest set self-reported34.810