try / scripts /GROQ_SETUP.md
proti0070's picture
Upload folder using huggingface_hub
7dbb323 verified

Groq API Setup Guide for HuggingClaw

⚡ Why Groq?

Groq is the FASTEST inference engine available - up to 500+ tokens/second!

Feature Groq Others
Speed ⚡⚡⚡⚡⚡ 500+ t/s ⚡⚡ 50-100 t/s
Latency <100ms 500ms-2s
Free Tier ✅ Yes, generous ⚠️ Limited
Models Llama 3/4, Qwen, Kimi, GPT-OSS Varies

⚠️ SECURITY WARNING

Never share your API key publicly! If you've shared it:

  1. Go to https://console.groq.com/api-keys
  2. Delete the compromised key
  3. Create a new one
  4. Store it securely (password manager, HF Spaces secrets)

Quick Start

Step 1: Get Your Groq API Key

  1. Go to https://console.groq.com
  2. Sign in or create account (free)
  3. Navigate to API Keys in left sidebar
  4. Click Create API Key
  5. Copy your key (starts with gsk_...)
  6. Keep it secret!

Step 2: Configure HuggingFace Spaces

In your Space Settings → Repository secrets, add:

GROQ_API_KEY=gsk_your-actual-api-key-here
OPENCLAW_DEFAULT_MODEL=groq/llama-3.3-70b-versatile

Step 3: Deploy

Push changes or redeploy the Space. Groq will be automatically configured.

Step 4: Use

  1. Open Space URL
  2. Enter gateway token (default: huggingclaw)
  3. Select "Llama 3.3 70B (Versatile)" from model dropdown
  4. Experience blazing fast responses! ⚡

Available Models (Verified 2025)

Chat Models

Model ID Name Context Speed Best For
llama-3.3-70b-versatile Llama 3.3 70B 128K ⚡⚡⚡⚡ Best overall
llama-3.1-8b-instant Llama 3.1 8B 128K ⚡⚡⚡⚡⚡ Ultra-fast
meta-llama/llama-4-maverick-17b-128e-instruct Llama 4 Maverick 128K ⚡⚡⚡⚡ Latest Llama 4
meta-llama/llama-4-scout-17b-16e-instruct Llama 4 Scout 128K ⚡⚡⚡⚡ Latest Llama 4
qwen/qwen3-32b Qwen3 32B 128K ⚡⚡⚡ Alibaba model
moonshotai/kimi-k2-instruct Kimi K2 128K ⚡⚡⚡ Moonshot AI
openai/gpt-oss-20b GPT-OSS 20B 128K ⚡⚡⚡ OpenAI open-source
allam-2-7b Allam-2 7B 4K ⚡⚡⚡⚡ Arabic/English

Audio Models

Model ID Name Purpose
whisper-large-v3-turbo Whisper Large V3 Turbo Speech-to-text
whisper-large-v3 Whisper Large V3 Speech-to-text

Safety Models

Model ID Name Purpose
meta-llama/llama-guard-4-12b Llama Guard 4 Content moderation
meta-llama/llama-prompt-guard-2-86m Llama Prompt Guard 2 Prompt injection detection

Configuration Options

Basic Setup (Recommended)

GROQ_API_KEY=gsk_xxxxx
OPENCLAW_DEFAULT_MODEL=groq/llama-3.3-70b-versatile

Multiple Providers

Use Groq as primary with fallbacks:

# Groq (primary - fastest)
GROQ_API_KEY=gsk_xxxxx

# OpenRouter (fallback - more models)
OPENROUTER_API_KEY=sk-or-v1-xxxxx

# Local Ollama (free backup)
LOCAL_MODEL_ENABLED=true
LOCAL_MODEL_NAME=neuralnexuslab/hacking

Priority order:

  1. Groq (if GROQ_API_KEY set) ← Fastest!
  2. xAI (if XAI_API_KEY set)
  3. OpenAI (if OPENAI_API_KEY set)
  4. OpenRouter (if OPENROUTER_API_KEY set)
  5. Local (if LOCAL_MODEL_ENABLED=true)

Model Recommendations

Best for General Use

OPENCLAW_DEFAULT_MODEL=groq/llama-3.3-70b-versatile
  • Excellent quality
  • 128K context window
  • Fast (500+ tokens/s)

Fastest Responses

OPENCLAW_DEFAULT_MODEL=groq/llama-3.1-8b-instant
  • Instant responses
  • Good for simple Q&A
  • Highest rate limits

Latest & Greatest

OPENCLAW_DEFAULT_MODEL=meta-llama/llama-4-maverick-17b-128e-instruct
  • Llama 4 architecture
  • Best reasoning
  • cutting-edge performance

Long Documents

OPENCLAW_DEFAULT_MODEL=groq/llama-3.3-70b-versatile
  • 128K context window
  • Can process entire books
  • Excellent summarization

Pricing

Free Tier (Generous!)

Model Rate Limit
Llama 3.1 8B ~30 req/min
Llama 3.3 70B ~30 req/min
Llama 4 Maverick ~30 req/min
Llama 4 Scout ~30 req/min
Qwen3 32B ~30 req/min
Kimi K2 ~30 req/min

Perfect for personal bots! Most users never need paid tier.

Paid Plans

Check https://groq.com/pricing for enterprise pricing.


Performance Comparison

Provider Tokens/sec Latency Cost
Groq Llama 3.3 500+ <100ms Free
Groq Llama 4 400+ <150ms Free
xAI Grok 100-200 200-500ms $
OpenAI GPT-4 50-100 500ms-1s $$$
Local Ollama 20-50 100-200ms Free

Troubleshooting

"Invalid API key"

  1. Verify key starts with gsk_
  2. No spaces or newlines
  3. Check key at https://console.groq.com/api-keys
  4. Regenerate if compromised

"Rate limit exceeded"

  • Free tier: ~30 requests/minute
  • Use llama-3.1-8b-instant for higher limits
  • Add delays between requests
  • Consider paid plan for heavy usage

"Model not found"

  • Use exact model ID from table above
  • Check model is active in Groq console
  • Some models may be region-restricted

Slow Responses

  • Groq should be <100ms
  • Check internet connection
  • HF Spaces region matters (US = fastest)

Example: WhatsApp Bot with Groq

# HF Spaces secrets
GROQ_API_KEY=gsk_xxxxx
HF_TOKEN=hf_xxxxx
AUTO_CREATE_DATASET=true

# WhatsApp (configure in Control UI)
WHATSAPP_PHONE=+1234567890
WHATSAPP_CODE=ABC123

Result: Ultra-fast WhatsApp AI bot! ⚡


API Reference

Test Your Key

curl https://api.groq.com/openai/v1/models \
  -H "Authorization: Bearer gsk_xxxxx"

Chat Completion

curl https://api.groq.com/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer gsk_xxxxx" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Best Practices

1. Choose Right Model

  • Chat: llama-3.3-70b-versatile
  • Fast Q&A: llama-3.1-8b-instant
  • Complex tasks: meta-llama/llama-4-maverick-17b-128e-instruct
  • Long docs: llama-3.3-70b-versatile (128K context)

2. Monitor Usage

Check https://console.groq.com/usage

3. Secure Your Key

  • Never commit to git
  • Use HF Spaces secrets
  • Rotate keys periodically

4. Set Up Alerts

Configure usage alerts in Groq console.


Next Steps

  1. Get API key from https://console.groq.com
  2. Set GROQ_API_KEY in HF Spaces secrets
  3. Deploy and test in Control UI
  4. Configure WhatsApp/Telegram channels
  5. 🎉 Enjoy sub-second AI responses!

Speed Test

After setup, test Groq's speed:

1. Open Control UI
2. Select "Llama 3.3 70B (Versatile)"
3. Send: "Write a 100-word story about a robot"
4. Watch it generate in <0.5 seconds! ⚡⚡⚡

Support


Available via OpenAI-Compatible API

All Groq models work via OpenAI-compatible endpoint:

OPENAI_API_KEY=gsk_xxxxx
OPENAI_BASE_URL=https://api.groq.com/openai/v1

This allows using Groq with any OpenAI-compatible client!