Instructions to use QuantFactory/Llama-3-8B-Web-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuantFactory/Llama-3-8B-Web-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QuantFactory/Llama-3-8B-Web-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("QuantFactory/Llama-3-8B-Web-GGUF", dtype="auto")

llama-cpp-python

How to use QuantFactory/Llama-3-8B-Web-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="QuantFactory/Llama-3-8B-Web-GGUF",
	filename="Llama-3-8B-Web.Q2_K.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use QuantFactory/Llama-3-8B-Web-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf QuantFactory/Llama-3-8B-Web-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf QuantFactory/Llama-3-8B-Web-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf QuantFactory/Llama-3-8B-Web-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf QuantFactory/Llama-3-8B-Web-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf QuantFactory/Llama-3-8B-Web-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf QuantFactory/Llama-3-8B-Web-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf QuantFactory/Llama-3-8B-Web-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf QuantFactory/Llama-3-8B-Web-GGUF:Q4_K_M

Use Docker

docker model run hf.co/QuantFactory/Llama-3-8B-Web-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use QuantFactory/Llama-3-8B-Web-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QuantFactory/Llama-3-8B-Web-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Llama-3-8B-Web-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/QuantFactory/Llama-3-8B-Web-GGUF:Q4_K_M

SGLang

How to use QuantFactory/Llama-3-8B-Web-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QuantFactory/Llama-3-8B-Web-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Llama-3-8B-Web-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QuantFactory/Llama-3-8B-Web-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Llama-3-8B-Web-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use QuantFactory/Llama-3-8B-Web-GGUF with Ollama:
```
ollama run hf.co/QuantFactory/Llama-3-8B-Web-GGUF:Q4_K_M
```

Unsloth Studio

How to use QuantFactory/Llama-3-8B-Web-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Llama-3-8B-Web-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Llama-3-8B-Web-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for QuantFactory/Llama-3-8B-Web-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use QuantFactory/Llama-3-8B-Web-GGUF with Docker Model Runner:
```
docker model run hf.co/QuantFactory/Llama-3-8B-Web-GGUF:Q4_K_M
```

Lemonade

How to use QuantFactory/Llama-3-8B-Web-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull QuantFactory/Llama-3-8B-Web-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Llama-3-8B-Web-GGUF-Q4_K_M

List all available models

lemonade list

Llama-3-8B-Web-GGUf

This is quantized version of McGill-NLP/Llama-3-8B-Web created using llama.cpp

Model Description

Our first agent is a finetuned Meta-Llama-3-8B-Instruct model, which was recently released by Meta GenAI team. We have finetuned this model on the WebLINX dataset, which contains over 100K instances of web navigation and dialogue, each collected and verified by expert annotators. We use a 24K curated subset for training the data. The training and evaluation data is available on Huggingface Hub as McGill-NLP/WebLINX.

It surpasses GPT-4V (zero-shot *) by over 18% on the WebLINX benchmark, achieving an overall score of 28.8% on the out-of-domain test splits (compared to 10.5% for GPT-4V). It chooses more useful links (34.1% vs 18.9% seg-F1), clicks on more relevant elements (27.1% vs 13.6% IoU) and formulates more aligned responses (37.5% vs 3.1% chr-F1).

About `WebLlama`

`WebLlama`	The goal of our project is to build effective human-centric agents for browsing the web. We don't want to replace users, but equip them with powerful assistants.
Modeling	We are build on top of cutting edge libraries for training Llama agents on web navigation tasks. We will provide training scripts, optimized configs, and instructions for training cutting-edge Llamas.
Evaluation	Benchmarks for testing Llama models on real-world web browsing. This include human-centric browsing through dialogue (`WebLINX`), and we will soon add more benchmarks for automatic web navigation (e.g. Mind2Web).
Data	Our first model is finetuned on over 24K instances of web interactions, including `click`, `textinput`, `submit`, and dialogue acts. We want to continuously curate, compile and release datasets for training better agents.
Deployment	We want to make it easy to integrate Llama models with existing deployment platforms, including Playwright, Selenium, and BrowserGym. We are currently focusing on making this a reality.

Evaluation

We believe short demo videos showing how well an agent performs is NOT enough to judge an agent. Simply put, we do not know if we have a good agent if we do not have good benchmarks. We need to systematically evaluate agents on wide range of tasks, spanning from simple instruction-following web navigation to complex dialogue-guided browsing.

This is why we chose WebLINX as our first benchmark. In addition to the training split, the benchmark has 4 real-world splits, with the goal of testing multiple dimensions of generalization: new websites, new domains, unseen geographic locations, and scenarios where the user cannot see the screen and relies on dialogue. It also covers 150 websites, including booking, shopping, writing, knowledge lookup, and even complex tasks like manipulating spreadsheets.

Data

Although the 24K training examples from WebLINX provide a good starting point for training a capable agent, we believe that more data is needed to train agents that can generalize to a wide range of web navigation tasks. Although it has been trained and evaluated on 150 websites, there are millions of websites that has never been seen by the model, with new ones being created every day.

This motivates us to continuously curate, compile and release datasets for training better agents. As an immediate next step, we will be incorporating Mind2Web's training data into the equation, which also covers over 100 websites.

Deployment

We are working hard to make it easy for you to deploy Llama web agents to the web. We want to integrate WebLlama with existing deployment platforms, including Microsoft's Playwright, ServiceNow Research's BrowserGym, and other partners.

Code

The code for finetuning the model and evaluating it on the WebLINX benchmark is available now. You can find the detailed instructions in modeling.

Downloads last month: 165

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Model tree for QuantFactory/Llama-3-8B-Web-GGUF

Base model

McGill-NLP/Llama-3-8B-Web

Quantized

(4)

this model

QuantFactory
/

Llama-3-8B-Web-GGUF

Llama-3-8B-Web-GGUf

Model Description

About `WebLlama`

Evaluation

Data

Deployment

Code

Model tree for QuantFactory/Llama-3-8B-Web-GGUF

Dataset used to train QuantFactory/Llama-3-8B-Web-GGUF

Llama-3-8B-Web-GGUf

Model Description

About WebLlama

Evaluation

Data

Deployment

Code

Model tree for QuantFactory/Llama-3-8B-Web-GGUF

Dataset used to train QuantFactory/Llama-3-8B-Web-GGUF

About `WebLlama`