Instructions to use WhiteRabbitNeo/Trinity-13B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WhiteRabbitNeo/Trinity-13B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="WhiteRabbitNeo/Trinity-13B", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("WhiteRabbitNeo/Trinity-13B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("WhiteRabbitNeo/Trinity-13B", trust_remote_code=True)

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use WhiteRabbitNeo/Trinity-13B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WhiteRabbitNeo/Trinity-13B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WhiteRabbitNeo/Trinity-13B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/WhiteRabbitNeo/Trinity-13B

SGLang

How to use WhiteRabbitNeo/Trinity-13B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WhiteRabbitNeo/Trinity-13B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WhiteRabbitNeo/Trinity-13B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WhiteRabbitNeo/Trinity-13B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WhiteRabbitNeo/Trinity-13B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use WhiteRabbitNeo/Trinity-13B with Docker Model Runner:
```
docker model run hf.co/WhiteRabbitNeo/Trinity-13B
```

migtissera commited on Jan 6, 2024

Commit

b03b56d

1 Parent(s): 990ccc5

Update README.md

Browse files

Files changed (1) hide show

README.md +69 -0

README.md CHANGED Viewed

	@@ -98,3 +98,72 @@ while True:
98
99	```
100

 ```
+# Sample Conversation:
+You: Code me a Fast API server, with one endpoint. This endpoint returns files from a S3 bucket.
+Here is how to code a Fast API server with one endpoint that returns files from an S3 bucket:
+```python
+from fastapi import FastAPI
+from starlette.responses import StreamingResponse
+import boto3
+import botocore.exceptions
+app = FastAPI()
+s3_client = boto3.client('s3')
+@app.get("/s3files")
+async def get_s3_files(bucket_name: str, key: str):
+    try:
+        obj = s3_client.get_object(Bucket=bucket_name, Key=key)
+        return StreamingResponse(obj['Body'], media_type=obj['ContentType'])
+    except botocore.exceptions.ClientError as e:
+        return {"error": str(e)}, 404
+```
+To explain:
+- We initialize a FastAPI app and an S3 client
+- The `/s3files` endpoint takes the bucket name and object key as parameters
+- We try to get the object from S3 using the S3 client
+- If successful, we return the object's Body wrapped in a StreamingResponse, setting the ContentType
+- If the client raises an error, we return a 404 with the error message
+This allows clients to request files from an S3 bucket by specifying the bucket name and object key. The endpoint streams the file content back to the client.
+To elaborate further:
+- The endpoint can be extended to validate inputs, handle authentication, logging, etc.
+- We could make this endpoint more generic by accepting a list of object keys instead of requiring a single key.
+- There are several ways to optimize performance, like caching responses.
+Let me know if you have any other questions!
+You: How do I authenticate s3?
+Here are a few ways to authenticate with S3:
+1. Credentials - You can provide AWS access key ID and secret access key to boto3. This will sign requests automatically.
+```python
+import boto3
+s3 = boto3.client('s3',
+                 aws_access_key_id='YOUR_ACCESS_KEY',
+                 aws_secret_access_key='YOUR_SECRET_KEY')
+```
+2. IAM roles - If you are running on EC2 with an IAM role, you don't need to provide credentials. Boto3 will automatically use the role.
+3. Environment variables - Store credentials in environment variables and read them in boto3.
+4. AWS credentials file - Store credentials in ~/.aws/credentials. Boto3 will automatically use them.
+5. STS Assume roles - Get temporary credentials from STS and assume a role to access S3.
+6. AWS SDK credential providers - Use environment variables, configuration files, etc to load credentials.
+So in summary, the best way depends on your deployment environment. The best practice is to use IAM roles when possible. Other options like environment variables and credential files can be useful for testing.
+Let me know if you have any other questions!