Instructions to use squ11z1/Gravity-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use squ11z1/Gravity-2 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="squ11z1/Gravity-2", filename="gravity-2-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use squ11z1/Gravity-2 with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf squ11z1/Gravity-2:Q4_K_M # Run inference directly in the terminal: llama cli -hf squ11z1/Gravity-2:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf squ11z1/Gravity-2:Q4_K_M # Run inference directly in the terminal: llama cli -hf squ11z1/Gravity-2:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf squ11z1/Gravity-2:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf squ11z1/Gravity-2:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf squ11z1/Gravity-2:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf squ11z1/Gravity-2:Q4_K_M
Use Docker
docker model run hf.co/squ11z1/Gravity-2:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use squ11z1/Gravity-2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "squ11z1/Gravity-2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "squ11z1/Gravity-2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/squ11z1/Gravity-2:Q4_K_M
- Ollama
How to use squ11z1/Gravity-2 with Ollama:
ollama run hf.co/squ11z1/Gravity-2:Q4_K_M
- Unsloth Studio
How to use squ11z1/Gravity-2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for squ11z1/Gravity-2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for squ11z1/Gravity-2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for squ11z1/Gravity-2 to start chatting
- Pi
How to use squ11z1/Gravity-2 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf squ11z1/Gravity-2:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "squ11z1/Gravity-2:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use squ11z1/Gravity-2 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf squ11z1/Gravity-2:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default squ11z1/Gravity-2:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use squ11z1/Gravity-2 with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf squ11z1/Gravity-2:Q4_K_M
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "squ11z1/Gravity-2:Q4_K_M" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use squ11z1/Gravity-2 with Docker Model Runner:
docker model run hf.co/squ11z1/Gravity-2:Q4_K_M
- Lemonade
How to use squ11z1/Gravity-2 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull squ11z1/Gravity-2:Q4_K_M
Run and chat with the model
lemonade run user.Gravity-2-Q4_K_M
List all available models
lemonade list
| license: mit | |
| pipeline_tag: text-generation | |
| tags: [research, experimental, gravity-attention, qwen2] | |
| # Gravity-2 | |
|  | |
| **Experimental research model by squ11z1.** | |
| A 3B reasoning model in which the standard | |
| scaled-dot-product attention is replaced by a physically-motivated **gravity attention**, | |
| then adapted with LoRA. This card documents a **stage-1 proof-of-mechanism** | |
| ## The experiment | |
| Transformer attention scores tokens by **alignment** β the dot product `qΒ·k`. Gravity-2 | |
| asks a different question: *what if tokens attended by **proximity** instead?* We replace | |
| the score with an inverse-square law borrowed from gravitation β each token is pulled | |
| toward others that are close in query/key space, weighted by a learnable per-head "mass": | |
| ``` | |
| M_hΒ² | |
| score(i, j) = βββββββββββββββββββββ β softmax_j( score ) | |
| βq_i β k_jβΒ² + Ξ΅ | |
| ``` | |
| - **M_h = softplus(gravity_mass_log[h])** β one learnable mass per **query head** (16 / layer), | |
| initialised at 0.5; `softplus` keeps it strictly positive. | |
| - **βq_i β k_jβΒ²** β squared L2 distance, computed stably as `βqβΒ² + βkβΒ² β 2Β·qΒ·k`. | |
| - **Ξ΅ = 0.1** β softening length; prevents the `q β k` singularity. | |
| - The raw gravity scores are then passed through the **usual softmax** (see Limitations). | |
| ### Why it's interesting | |
| - **Different inductive bias.** Dot-product attention rewards directional alignment; | |
| inverse-distance rewards *locality* in the learned embedding geometry β a metric prior | |
| rather than an inner-product one. | |
| - **Interpretable per-head masses.** Each head learns a scalar "mass" controlling how | |
| sharply it concentrates β a compact, inspectable knob (see `figures/04_mass_heatmap.png`). | |
| - **A bridge to physics-style sparsity.** An inverse-square field is naturally local, which | |
| later stages (pruning / QUBO, "Gravity-6") aim to exploit for structured sparsity. | |
| ## Architecture | |
| Qwen2-3B class: 36 layers, hidden 2048, **16 query heads / 2 KV heads (GQA, group size 8)**, | |
| head_dim 128. The 2 KV heads are `repeat_kv`-expanded to 16 before the distance, so each | |
| query head gets its own mass. Integrated via the transformers-5.x `AttentionInterface` | |
| (a registered `"gravity"` op + eager causal-mask reuse) β RoPE / KV-cache / masking are | |
| left to the framework; only the score function changes. | |
| ## Results | |
| | | | | |
| |---|---| | |
| |  |  | | |
| |  |  | | |
| |  |  | | |
| ## Honest limitations | |
| - **Not "pure" gravity.** The inverse-square scores are renormalised by a **softmax on top** | |
| (`softmax_j(MΒ²/(dΒ²+Ξ΅))`). Without it training was unstable, but it means this is a | |
| *distance-biased softmax attention*, not a literal gravitational field β the normalisation | |
| reintroduces global competition between keys. | |
| - **MHA β GQA transfer is an open question.** The mechanism was first prototyped on MHA | |
| (1 KV head per query head). Here it runs on GQA by `repeat_kv`-expanding 2 KV heads to 16 | |
| and giving each query head its own mass; whether this is the right granularity (vs. one | |
| mass per KV group) is **unresolved** and may matter for convergence. | |
| - **Loading requires the patch** (below). **GGUF builds run standard attention, not gravity** | |
| (llama.cpp has no kernel for `MΒ²/(βqβkβΒ²+Ξ΅)`) β the `*.gguf` files are format placeholders | |
| and produce incorrect output. | |
| ## Loading (requires the gravity patch) | |
| ```bash | |
| python load_gravity2.py # from_pretrained -> patch_qwen_with_gravity -> load gravity_mass_log.pt | |
| ``` | |
| Weights are LoRA-merged into the base but were trained under gravity scoring; loading them | |
| under vanilla attention gives garbage. `config.json` ships `_attn_implementation="eager"` | |
| only so the checkpoint loads β the patch switches it to gravity. | |
| ## License & attribution | |
| Released under the **MIT License**. This is a **derivative work of | |
| [`WeiboAI/VibeThinker-3B`](https://huggingface.co/WeiboAI/VibeThinker-3B)** (the base model | |
| for the experiment), which is distributed under the **MIT License**; that license is | |
| inherited here and the original authors are credited accordingly. | |