Instructions to use fletch1300/homen_testing_merged6 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use fletch1300/homen_testing_merged6 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="fletch1300/homen_testing_merged6", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("fletch1300/homen_testing_merged6", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use fletch1300/homen_testing_merged6 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "fletch1300/homen_testing_merged6"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fletch1300/homen_testing_merged6",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/fletch1300/homen_testing_merged6

SGLang

How to use fletch1300/homen_testing_merged6 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "fletch1300/homen_testing_merged6" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fletch1300/homen_testing_merged6",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "fletch1300/homen_testing_merged6" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fletch1300/homen_testing_merged6",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use fletch1300/homen_testing_merged6 with Docker Model Runner:
```
docker model run hf.co/fletch1300/homen_testing_merged6
```

fletch1300 commited on Oct 12, 2023

Commit

9c3fdba

1 Parent(s): 7fb554f

Update handler.py

Browse files

Files changed (1) hide show

handler.py +6 -17

handler.py CHANGED Viewed

@@ -43,31 +43,20 @@ class EndpointHandler:
             return self.tokenizer.decode(tokens)
         return text
     def __call__(self, data: Dict[str, Any]) -> Dict[str, Any]:
         user_prompt = data.pop("inputs", data)
-        # Add the user's message to the conversation history
-        self.conversation_history += f"<user>: {user_prompt}\n"
-        # Ensure the conversation history is within token limit
-        self.conversation_history = self._ensure_token_limit(self.conversation_history)
-        # Add the permanent context, user's prompt, and conversation history
-        permanent_context = "<context>: You are a life coaching bot with the goal of providing guidance, improving understanding, reducing suffering and improving life. Gain as much understanding of the user before providing guidance."
-        structured_prompt = f"{permanent_context}\n{self.conversation_history}<bot> response:"
         result = self.pipeline(structured_prompt, generation_config=self.generate_config)
-        # Extract only the bot's response without the structuring text
         response_text = self._extract_response(result[0]['generated_text'])
-        # Remove the last "<bot>" from the response_text
         response_text = response_text.rsplit("[END", 1)[0].strip()
-        # Add the bot's response to the conversation history
-        self.conversation_history += f"<bot>: {response_text}\n"
-        self.conversation_history = self._ensure_token_limit(self.conversation_history)
-        return [{"generated_text": response_text}]
         return {"response": response_text}

             return self.tokenizer.decode(tokens)
         return text
     def __call__(self, data: Dict[str, Any]) -> Dict[str, Any]:
         user_prompt = data.pop("inputs", data)
+        # Permanent context
+        permanent_context = "<context>: You are a life coaching bot..."
+        structured_prompt = f"{permanent_context}\<bot> response:"
         result = self.pipeline(structured_prompt, generation_config=self.generate_config)
+        # Ensure _extract_response is defined and works as intended
         response_text = self._extract_response(result[0]['generated_text'])
+        # Trimming response
         response_text = response_text.rsplit("[END", 1)[0].strip()
         return {"response": response_text}