Instructions to use pedrogarcias/falcon_response with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use pedrogarcias/falcon_response with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="pedrogarcias/falcon_response", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("pedrogarcias/falcon_response", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use pedrogarcias/falcon_response with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "pedrogarcias/falcon_response"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pedrogarcias/falcon_response",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/pedrogarcias/falcon_response

SGLang

How to use pedrogarcias/falcon_response with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "pedrogarcias/falcon_response" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pedrogarcias/falcon_response",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "pedrogarcias/falcon_response" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pedrogarcias/falcon_response",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use pedrogarcias/falcon_response with Docker Model Runner:
```
docker model run hf.co/pedrogarcias/falcon_response
```

falcon_response

Commit History

Upload tokenizer

004ad16

pedrogarcias commited on Aug 23, 2023

Upload RWForCausalLM

a68b94e

pedrogarcias commited on Aug 23, 2023

Upload tokenizer

95548bd

pedrogarcias commited on Aug 22, 2023

Upload RWForCausalLM

0873e8e

pedrogarcias commited on Aug 22, 2023

Upload tokenizer

28aff68

pedrogarcias commited on Aug 22, 2023

Upload RWForCausalLM

fb2c68f

pedrogarcias commited on Aug 22, 2023

initial commit

bcadbd4

PEDRO GARCIAS PARREIRA ALMEIDA commited on Aug 22, 2023

Commit History

Upload tokenizer 004ad16

Upload RWForCausalLM a68b94e

Upload tokenizer 95548bd

Upload RWForCausalLM 0873e8e

Upload tokenizer 28aff68

Upload RWForCausalLM fb2c68f

initial commit bcadbd4

Upload tokenizer

004ad16

Upload RWForCausalLM

a68b94e

Upload tokenizer

95548bd

Upload RWForCausalLM

0873e8e

Upload tokenizer

28aff68

Upload RWForCausalLM

fb2c68f

initial commit

bcadbd4