Instructions to use Salesforce/xgen-7b-4k-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Salesforce/xgen-7b-4k-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Salesforce/xgen-7b-4k-base")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/xgen-7b-4k-base")
model = AutoModelForCausalLM.from_pretrained("Salesforce/xgen-7b-4k-base")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Salesforce/xgen-7b-4k-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Salesforce/xgen-7b-4k-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/xgen-7b-4k-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Salesforce/xgen-7b-4k-base

SGLang

How to use Salesforce/xgen-7b-4k-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Salesforce/xgen-7b-4k-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/xgen-7b-4k-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Salesforce/xgen-7b-4k-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/xgen-7b-4k-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Salesforce/xgen-7b-4k-base with Docker Model Runner:
```
docker model run hf.co/Salesforce/xgen-7b-4k-base
```

update _convert_id_to_token

by tianxie-sf - opened Jul 2, 2023

base: refs/heads/main

←

from: refs/pr/7

Discussion Files changed

+13

-7

Files changed (1) hide show

tokenization_xgen.py +13 -7

tokenization_xgen.py CHANGED Viewed

@@ -149,20 +149,22 @@ class XgenTokenizer(PreTrainedTokenizer):
     def _convert_token_to_id(self, token):
         """Converts a token (str) in an id using the vocab."""
         if isinstance(token, str):
-            ids = self._tokenize(token)
-            return ids[0]
-        return token
     def _convert_id_to_token(self, index):
         """Converts an index (integer) in a token (str) using the vocab."""
-        return self.encoder.decode_single_token_bytes(index)
     def _decode(self, token_ids: List[int], skip_special_tokens: bool = False, **kwargs):
         return self.encoder.decode(token_ids)
     def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None) -> List[int]:
         """Build model inputs from a sequence by appending eos_token_id."""
-        eos_token_id = [50256] if self.add_eos_token else []
         output = token_ids_0 + eos_token_id
@@ -218,11 +220,15 @@ class XgenTokenizer(PreTrainedTokenizer):
         Returns:
             `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
         """
-        eos_token_id = [50256] if self.add_eos_token else []
         output = [0] * len(token_ids_0 + eos_token_id)
         if token_ids_1 is not None:
             output += [1] * len(token_ids_1 + eos_token_id)
-        return output

     def _convert_token_to_id(self, token):
         """Converts a token (str) in an id using the vocab."""
         if isinstance(token, str):
+            return self.encoder.encode_single_token(token)
+        else:
+            return token
     def _convert_id_to_token(self, index):
         """Converts an index (integer) in a token (str) using the vocab."""
+        return self.encoder.decode_single_token_bytes(index).decode("utf-8")
     def _decode(self, token_ids: List[int], skip_special_tokens: bool = False, **kwargs):
+        if skip_special_tokens:
+            token_ids = [t for t in token_ids if t not in self.all_special_ids]
         return self.encoder.decode(token_ids)
     def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None) -> List[int]:
         """Build model inputs from a sequence by appending eos_token_id."""
+        eos_token_id = [self.eos_token_id] if self.add_eos_token else []
         output = token_ids_0 + eos_token_id
         Returns:
             `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
         """
+        eos_token_id = [self.eos_token_id] if self.add_eos_token else []
         output = [0] * len(token_ids_0 + eos_token_id)
         if token_ids_1 is not None:
             output += [1] * len(token_ids_1 + eos_token_id)
+        return output
+    # has no vocab file
+    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None):
+        return ()