Instructions to use maicomputer/alpaca-native with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use maicomputer/alpaca-native with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="maicomputer/alpaca-native")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("maicomputer/alpaca-native")
model = AutoModelForCausalLM.from_pretrained("maicomputer/alpaca-native")

Inference
Local Apps Settings

vLLM

How to use maicomputer/alpaca-native with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "maicomputer/alpaca-native"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maicomputer/alpaca-native",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/maicomputer/alpaca-native

SGLang

How to use maicomputer/alpaca-native with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "maicomputer/alpaca-native" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maicomputer/alpaca-native",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "maicomputer/alpaca-native" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maicomputer/alpaca-native",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use maicomputer/alpaca-native with Docker Model Runner:
```
docker model run hf.co/maicomputer/alpaca-native
```

Error when generating multiple outputs using hugging face generation

#16

by DongfuJiang - opened Apr 10, 2023

Discussion

DongfuJiang

Apr 10, 2023

•

edited Apr 10, 2023

I do a top p sampling on this model, and I first run it on pure cpu. However, I get an [error](IndexError: index out of range in self) for the Llama embedtokens. I check token that cause this index error and found that, if you use hugging face generate() function to do the generation, it will automatically read the pad_token_id from config.json. And in that file, pad_token_id is set to -1, which causes this index error for the embeding.
I again checked the tokenizer pad_token_id from the tokenizer and and found that it's actually 0 instead of -1. So I guess this must be the error in the config.json file.
May managers take a look at this file and fix this?

lover99

Apr 12, 2023

I also find this batch generation problem and have no idea how to handle it, your solution works for me, thanks a lot!

christangttt

Aug 8, 2023

Thank you! This also fixes my bug on LLaMa.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment