Instructions to use PygmalionAI/pygmalion-350m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use PygmalionAI/pygmalion-350m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="PygmalionAI/pygmalion-350m") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("PygmalionAI/pygmalion-350m") model = AutoModelForCausalLM.from_pretrained("PygmalionAI/pygmalion-350m") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use PygmalionAI/pygmalion-350m with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "PygmalionAI/pygmalion-350m" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PygmalionAI/pygmalion-350m", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/PygmalionAI/pygmalion-350m
- SGLang
How to use PygmalionAI/pygmalion-350m with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "PygmalionAI/pygmalion-350m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PygmalionAI/pygmalion-350m", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "PygmalionAI/pygmalion-350m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PygmalionAI/pygmalion-350m", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use PygmalionAI/pygmalion-350m with Docker Model Runner:
docker model run hf.co/PygmalionAI/pygmalion-350m
Commit ·
654b11e
1
Parent(s): d65832d
Adding Evaluation Results
Browse filesThis is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr
The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.
If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions
README.md
CHANGED
|
@@ -21,4 +21,17 @@ This model was much easier than expected to create.
|
|
| 21 |
|
| 22 |
We used the [ColossalAI](https://www.colossalai.org/) library to fine-tune the [OPT-350M](https://huggingface.co/facebook/opt-350m) model originally trained by Facebook on The Pile. Though our initial dataset was sets of dialogue gathered from various sources totaling about 50 MB in size, early training runs revealed that the model converged after only 7% of the dataset was passed through. To alleviate this, we massively reduced the size of the dataset to only 273 KB.
|
| 23 |
|
| 24 |
-
ColossalAI's magic allowed for something incredible: this entire model was fine-tuned on a singular GPU with only 6 GB ***(!)*** of VRAM. Fine-tuning took less than an hour to complete.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
We used the [ColossalAI](https://www.colossalai.org/) library to fine-tune the [OPT-350M](https://huggingface.co/facebook/opt-350m) model originally trained by Facebook on The Pile. Though our initial dataset was sets of dialogue gathered from various sources totaling about 50 MB in size, early training runs revealed that the model converged after only 7% of the dataset was passed through. To alleviate this, we massively reduced the size of the dataset to only 273 KB.
|
| 23 |
|
| 24 |
+
ColossalAI's magic allowed for something incredible: this entire model was fine-tuned on a singular GPU with only 6 GB ***(!)*** of VRAM. Fine-tuning took less than an hour to complete.
|
| 25 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
| 26 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_PygmalionAI__pygmalion-350m)
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-----------------------|---------------------------|
|
| 30 |
+
| Avg. | 26.23 |
|
| 31 |
+
| ARC (25-shot) | 25.0 |
|
| 32 |
+
| HellaSwag (10-shot) | 37.8 |
|
| 33 |
+
| MMLU (5-shot) | 25.68 |
|
| 34 |
+
| TruthfulQA (0-shot) | 40.41 |
|
| 35 |
+
| Winogrande (5-shot) | 50.28 |
|
| 36 |
+
| GSM8K (5-shot) | 0.53 |
|
| 37 |
+
| DROP (3-shot) | 3.89 |
|