Instructions to use llmware/bling-answer-tool with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use llmware/bling-answer-tool with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("llmware/bling-answer-tool", dtype="auto")

llama-cpp-python

How to use llmware/bling-answer-tool with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="llmware/bling-answer-tool",
	filename="bling-answer.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use llmware/bling-answer-tool with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf llmware/bling-answer-tool
# Run inference directly in the terminal:
llama-cli -hf llmware/bling-answer-tool

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf llmware/bling-answer-tool
# Run inference directly in the terminal:
llama-cli -hf llmware/bling-answer-tool

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf llmware/bling-answer-tool
# Run inference directly in the terminal:
./llama-cli -hf llmware/bling-answer-tool

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf llmware/bling-answer-tool
# Run inference directly in the terminal:
./build/bin/llama-cli -hf llmware/bling-answer-tool

Use Docker

docker model run hf.co/llmware/bling-answer-tool

LM Studio
Jan
Ollama
How to use llmware/bling-answer-tool with Ollama:
```
ollama run hf.co/llmware/bling-answer-tool
```

Unsloth Studio

How to use llmware/bling-answer-tool with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for llmware/bling-answer-tool to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for llmware/bling-answer-tool to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for llmware/bling-answer-tool to start chatting

Docker Model Runner
How to use llmware/bling-answer-tool with Docker Model Runner:
```
docker model run hf.co/llmware/bling-answer-tool
```

Lemonade

How to use llmware/bling-answer-tool with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull llmware/bling-answer-tool

Run and chat with the model

lemonade run user.bling-answer-tool-{{QUANT_TAG}}

List all available models

lemonade list

doberst commited on Jan 29, 2024

Commit

05447f6

verified ·

1 Parent(s): 938c112

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -12

README.md CHANGED Viewed

@@ -6,20 +6,18 @@ license: apache-2.0
 <!-- Provide a quick summary of what the model is/does. -->
-**slim-ner-tool** is part of the SLIM ("Structured Language Instruction Model") model series, providing a set of small, specialized decoder-based LLMs, fine-tuned for function-calling.
-slim-ner-tool is a 4_K_M quantized GGUF version of slim-ner, providing a small, fast inference implementation.
 Load in your favorite GGUF inference engine (see details in config.json to set up the prompt template), or try with llmware as follows:
     from llmware.models import ModelCatalog
     # to load the model and make a basic inference
-    ner_tool = ModelCatalog().load_model("slim-ner-tool")
-    response = ner_tool.function_call(text_sample)
     # this one line will download the model and run a series of tests
-    ModelCatalog().test_run("slim-ner-tool", verbose=True)
 Slim models can also be loaded even more simply as part of a multi-model, multi-step LLMfx calls:
@@ -27,8 +25,8 @@ Slim models can also be loaded even more simply as part of a multi-model, multi-
     from llmware.agents import LLMfx
     llm_fx = LLMfx()
-    llm_fx.load_tool("ner")
-    response = llm_fx.named_entity_extraction(text)
 ### Model Description
@@ -39,18 +37,15 @@ Slim models can also be loaded even more simply as part of a multi-model, multi-
 - **Model type:** GGUF
 - **Language(s) (NLP):** English
 - **License:** Apache 2.0
-- **Quantized from model:** llmware/slim-sentiment (finetuned tiny llama)
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-SLIM models provide a fast, flexible, intuitive way to integrate classifiers and structured function calls into RAG and LLM application workflows.
 Model instructions, details and test samples have been packaged into the config.json file in the repository, along with the GGUF file.
 ## Model Card Contact
 Darren Oberst & llmware team

 <!-- Provide a quick summary of what the model is/does. -->
+**bling-qa-tool** is a 4_K_M quantized GGUF version of bling-tiny-llama-1b-v0, providing a small, fast inference implementation.
 Load in your favorite GGUF inference engine (see details in config.json to set up the prompt template), or try with llmware as follows:
     from llmware.models import ModelCatalog
     # to load the model and make a basic inference
+    qa_tool = ModelCatalog().load_model("bling-qa-tool")
+    response = qa_tool.function_call(text_sample)
     # this one line will download the model and run a series of tests
+    ModelCatalog().test_run("bling-qa-tool", verbose=True)
 Slim models can also be loaded even more simply as part of a multi-model, multi-step LLMfx calls:
     from llmware.agents import LLMfx
     llm_fx = LLMfx()
+    llm_fx.load_tool("quick_question")
+    response = llm_fx.quick_question(text)
 ### Model Description
 - **Model type:** GGUF
 - **Language(s) (NLP):** English
 - **License:** Apache 2.0
+- **Quantized from model:** llmware/bling-tiny-llama-1b-v0
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 Model instructions, details and test samples have been packaged into the config.json file in the repository, along with the GGUF file.
 ## Model Card Contact
 Darren Oberst & llmware team