Instructions to use lifelongeeek/vic_critT_20pr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lifelongeeek/vic_critT_20pr with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="lifelongeeek/vic_critT_20pr")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("lifelongeeek/vic_critT_20pr") model = AutoModelForCausalLM.from_pretrained("lifelongeeek/vic_critT_20pr") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use lifelongeeek/vic_critT_20pr with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lifelongeeek/vic_critT_20pr" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lifelongeeek/vic_critT_20pr", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/lifelongeeek/vic_critT_20pr
- SGLang
How to use lifelongeeek/vic_critT_20pr with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "lifelongeeek/vic_critT_20pr" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lifelongeeek/vic_critT_20pr", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "lifelongeeek/vic_critT_20pr" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lifelongeeek/vic_critT_20pr", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use lifelongeeek/vic_critT_20pr with Docker Model Runner:
docker model run hf.co/lifelongeeek/vic_critT_20pr
This model is a weight-pruned large language model originated from Vicuna-13B. Language model pruning is a technique used to reduce the size and computational requirements of language models, making them more efficient for deployment without significantly sacrificing their performance or accuracy.
This model uses structured pruning instead of unstructured pruning. The structured pruning removes entire units or channels (e.g., neurons, layers, or filter channels in trnasformer). This approach can lead to more efficient computational gains since it aligns better with how hardware utilizes data, but it may have a more significant impact on model performance. However, the unstructured pruning, remove individual weights across the model without regard to the structure of the network. While it can lead to significant reductions in model size, it may not always translate to speed gains since the resulting sparse matrices might not be efficiently handled by all hardware.
- Downloads last month
- 9