try / scripts /GROQ_SETUP.md
proti0070's picture
Upload folder using huggingface_hub
7dbb323 verified
# Groq API Setup Guide for HuggingClaw
## ⚑ Why Groq?
**Groq is the FASTEST inference engine available** - up to 500+ tokens/second!
| Feature | Groq | Others |
|---------|------|--------|
| Speed | ⚑⚑⚑⚑⚑ 500+ t/s | ⚑⚑ 50-100 t/s |
| Latency | <100ms | 500ms-2s |
| Free Tier | βœ… Yes, generous | ⚠️ Limited |
| Models | Llama 3/4, Qwen, Kimi, GPT-OSS | Varies |
---
## ⚠️ SECURITY WARNING
**Never share your API key publicly!** If you've shared it:
1. Go to https://console.groq.com/api-keys
2. Delete the compromised key
3. Create a new one
4. Store it securely (password manager, HF Spaces secrets)
---
## Quick Start
### Step 1: Get Your Groq API Key
1. Go to **https://console.groq.com**
2. Sign in or create account (free)
3. Navigate to **API Keys** in left sidebar
4. Click **Create API Key**
5. Copy your key (starts with `gsk_...`)
6. **Keep it secret!**
### Step 2: Configure HuggingFace Spaces
In your Space **Settings β†’ Repository secrets**, add:
```bash
GROQ_API_KEY=gsk_your-actual-api-key-here
OPENCLAW_DEFAULT_MODEL=groq/llama-3.3-70b-versatile
```
### Step 3: Deploy
Push changes or redeploy the Space. Groq will be automatically configured.
### Step 4: Use
1. Open Space URL
2. Enter gateway token (default: `huggingclaw`)
3. Select "Llama 3.3 70B (Versatile)" from model dropdown
4. Experience blazing fast responses! ⚑
---
## Available Models (Verified 2025)
### Chat Models
| Model ID | Name | Context | Speed | Best For |
|----------|------|---------|-------|----------|
| `llama-3.3-70b-versatile` | Llama 3.3 70B | 128K | ⚑⚑⚑⚑ | **Best overall** |
| `llama-3.1-8b-instant` | Llama 3.1 8B | 128K | ⚑⚑⚑⚑⚑ | Ultra-fast |
| `meta-llama/llama-4-maverick-17b-128e-instruct` | Llama 4 Maverick | 128K | ⚑⚑⚑⚑ | Latest Llama 4 |
| `meta-llama/llama-4-scout-17b-16e-instruct` | Llama 4 Scout | 128K | ⚑⚑⚑⚑ | Latest Llama 4 |
| `qwen/qwen3-32b` | Qwen3 32B | 128K | ⚑⚑⚑ | Alibaba model |
| `moonshotai/kimi-k2-instruct` | Kimi K2 | 128K | ⚑⚑⚑ | Moonshot AI |
| `openai/gpt-oss-20b` | GPT-OSS 20B | 128K | ⚑⚑⚑ | OpenAI open-source |
| `allam-2-7b` | Allam-2 7B | 4K | ⚑⚑⚑⚑ | Arabic/English |
### Audio Models
| Model ID | Name | Purpose |
|----------|------|---------|
| `whisper-large-v3-turbo` | Whisper Large V3 Turbo | Speech-to-text |
| `whisper-large-v3` | Whisper Large V3 | Speech-to-text |
### Safety Models
| Model ID | Name | Purpose |
|----------|------|---------|
| `meta-llama/llama-guard-4-12b` | Llama Guard 4 | Content moderation |
| `meta-llama/llama-prompt-guard-2-86m` | Llama Prompt Guard 2 | Prompt injection detection |
---
## Configuration Options
### Basic Setup (Recommended)
```bash
GROQ_API_KEY=gsk_xxxxx
OPENCLAW_DEFAULT_MODEL=groq/llama-3.3-70b-versatile
```
### Multiple Providers
Use Groq as primary with fallbacks:
```bash
# Groq (primary - fastest)
GROQ_API_KEY=gsk_xxxxx
# OpenRouter (fallback - more models)
OPENROUTER_API_KEY=sk-or-v1-xxxxx
# Local Ollama (free backup)
LOCAL_MODEL_ENABLED=true
LOCAL_MODEL_NAME=neuralnexuslab/hacking
```
Priority order:
1. **Groq** (if `GROQ_API_KEY` set) ← Fastest!
2. xAI (if `XAI_API_KEY` set)
3. OpenAI (if `OPENAI_API_KEY` set)
4. OpenRouter (if `OPENROUTER_API_KEY` set)
5. Local (if `LOCAL_MODEL_ENABLED=true`)
---
## Model Recommendations
### Best for General Use
```bash
OPENCLAW_DEFAULT_MODEL=groq/llama-3.3-70b-versatile
```
- Excellent quality
- 128K context window
- Fast (500+ tokens/s)
### Fastest Responses
```bash
OPENCLAW_DEFAULT_MODEL=groq/llama-3.1-8b-instant
```
- Instant responses
- Good for simple Q&A
- Highest rate limits
### Latest & Greatest
```bash
OPENCLAW_DEFAULT_MODEL=meta-llama/llama-4-maverick-17b-128e-instruct
```
- Llama 4 architecture
- Best reasoning
- cutting-edge performance
### Long Documents
```bash
OPENCLAW_DEFAULT_MODEL=groq/llama-3.3-70b-versatile
```
- 128K context window
- Can process entire books
- Excellent summarization
---
## Pricing
### Free Tier (Generous!)
| Model | Rate Limit |
|-------|-----------|
| Llama 3.1 8B | ~30 req/min |
| Llama 3.3 70B | ~30 req/min |
| Llama 4 Maverick | ~30 req/min |
| Llama 4 Scout | ~30 req/min |
| Qwen3 32B | ~30 req/min |
| Kimi K2 | ~30 req/min |
**Perfect for personal bots!** Most users never need paid tier.
### Paid Plans
Check https://groq.com/pricing for enterprise pricing.
---
## Performance Comparison
| Provider | Tokens/sec | Latency | Cost |
|----------|-----------|---------|------|
| **Groq Llama 3.3** | 500+ | <100ms | Free |
| Groq Llama 4 | 400+ | <150ms | Free |
| xAI Grok | 100-200 | 200-500ms | $ |
| OpenAI GPT-4 | 50-100 | 500ms-1s | $$$ |
| Local Ollama | 20-50 | 100-200ms | Free |
---
## Troubleshooting
### "Invalid API key"
1. Verify key starts with `gsk_`
2. No spaces or newlines
3. Check key at https://console.groq.com/api-keys
4. **Regenerate if compromised**
### "Rate limit exceeded"
- Free tier: ~30 requests/minute
- Use `llama-3.1-8b-instant` for higher limits
- Add delays between requests
- Consider paid plan for heavy usage
### "Model not found"
- Use exact model ID from table above
- Check model is active in Groq console
- Some models may be region-restricted
### Slow Responses
- Groq should be <100ms
- Check internet connection
- HF Spaces region matters (US = fastest)
---
## Example: WhatsApp Bot with Groq
```bash
# HF Spaces secrets
GROQ_API_KEY=gsk_xxxxx
HF_TOKEN=hf_xxxxx
AUTO_CREATE_DATASET=true
# WhatsApp (configure in Control UI)
WHATSAPP_PHONE=+1234567890
WHATSAPP_CODE=ABC123
```
Result: **Ultra-fast** WhatsApp AI bot! ⚑
---
## API Reference
### Test Your Key
```bash
curl https://api.groq.com/openai/v1/models \
-H "Authorization: Bearer gsk_xxxxx"
```
### Chat Completion
```bash
curl https://api.groq.com/openai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer gsk_xxxxx" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
```
---
## Best Practices
### 1. Choose Right Model
- **Chat**: `llama-3.3-70b-versatile`
- **Fast Q&A**: `llama-3.1-8b-instant`
- **Complex tasks**: `meta-llama/llama-4-maverick-17b-128e-instruct`
- **Long docs**: `llama-3.3-70b-versatile` (128K context)
### 2. Monitor Usage
Check https://console.groq.com/usage
### 3. Secure Your Key
- Never commit to git
- Use HF Spaces secrets
- Rotate keys periodically
### 4. Set Up Alerts
Configure usage alerts in Groq console.
---
## Next Steps
1. βœ… **Get API key** from https://console.groq.com
2. βœ… **Set `GROQ_API_KEY`** in HF Spaces secrets
3. βœ… **Deploy** and test in Control UI
4. βœ… **Configure** WhatsApp/Telegram channels
5. πŸŽ‰ Enjoy **sub-second** AI responses!
---
## Speed Test
After setup, test Groq's speed:
```
1. Open Control UI
2. Select "Llama 3.3 70B (Versatile)"
3. Send: "Write a 100-word story about a robot"
4. Watch it generate in <0.5 seconds! ⚑⚑⚑
```
---
## Support
- **Groq Docs**: https://console.groq.com/docs
- **API Status**: https://status.groq.com
- **HuggingClaw**: https://github.com/openclaw/openclaw/issues
---
## Available via OpenAI-Compatible API
All Groq models work via OpenAI-compatible endpoint:
```bash
OPENAI_API_KEY=gsk_xxxxx
OPENAI_BASE_URL=https://api.groq.com/openai/v1
```
This allows using Groq with any OpenAI-compatible client!