Instructions to use lifelongeeek/vic_critT_20pr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lifelongeeek/vic_critT_20pr with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="lifelongeeek/vic_critT_20pr")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("lifelongeeek/vic_critT_20pr")
model = AutoModelForCausalLM.from_pretrained("lifelongeeek/vic_critT_20pr")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use lifelongeeek/vic_critT_20pr with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lifelongeeek/vic_critT_20pr"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lifelongeeek/vic_critT_20pr",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/lifelongeeek/vic_critT_20pr

SGLang

How to use lifelongeeek/vic_critT_20pr with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lifelongeeek/vic_critT_20pr" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lifelongeeek/vic_critT_20pr",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lifelongeeek/vic_critT_20pr" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lifelongeeek/vic_critT_20pr",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use lifelongeeek/vic_critT_20pr with Docker Model Runner:
```
docker model run hf.co/lifelongeeek/vic_critT_20pr
```

This model is a weight-pruned large language model originated from Vicuna-13B. Language model pruning is a technique used to reduce the size and computational requirements of language models, making them more efficient for deployment without significantly sacrificing their performance or accuracy.

This model uses structured pruning instead of unstructured pruning. The structured pruning removes entire units or channels (e.g., neurons, layers, or filter channels in trnasformer). This approach can lead to more efficient computational gains since it aligns better with how hardware utilizes data, but it may have a more significant impact on model performance. However, the unstructured pruning, remove individual weights across the model without regard to the structure of the network. While it can lead to significant reductions in model size, it may not always translate to speed gains since the resulting sparse matrices might not be efficiently handled by all hardware.

Downloads last month: 9

Safetensors

Model size

10B params

Tensor type

F16