Spaces:

proti0070
/

try

Sleeping

App Files Files Community

try / scripts /GROQ_SETUP.md

proti0070

Upload folder using huggingface_hub

7dbb323 verified 18 days ago

preview code

raw

history blame contribute delete

7.42 kB

	# Groq API Setup Guide for HuggingClaw

	## ⚡ Why Groq?

	Groq is the FASTEST inference engine available - up to 500+ tokens/second!

	\| Feature \| Groq \| Others \|
	\|---------\|------\|--------\|
	\| Speed \| ⚡⚡⚡⚡⚡ 500+ t/s \| ⚡⚡ 50-100 t/s \|
	\| Latency \| <100ms \| 500ms-2s \|
	\| Free Tier \| ✅ Yes, generous \| ⚠️ Limited \|
	\| Models \| Llama 3/4, Qwen, Kimi, GPT-OSS \| Varies \|

	---

	## ⚠️ SECURITY WARNING

	Never share your API key publicly! If you've shared it:

	1. Go to https://console.groq.com/api-keys
	2. Delete the compromised key
	3. Create a new one
	4. Store it securely (password manager, HF Spaces secrets)

	---

	## Quick Start

	### Step 1: Get Your Groq API Key

	1. Go to https://console.groq.com
	2. Sign in or create account (free)
	3. Navigate to API Keys in left sidebar
	4. Click Create API Key
	5. Copy your key (starts with `gsk_...`)
	6. Keep it secret!

	### Step 2: Configure HuggingFace Spaces

	In your Space Settings → Repository secrets, add:

	```bash
	GROQ_API_KEY=gsk_your-actual-api-key-here
	OPENCLAW_DEFAULT_MODEL=groq/llama-3.3-70b-versatile
	```

	### Step 3: Deploy

	Push changes or redeploy the Space. Groq will be automatically configured.

	### Step 4: Use

	1. Open Space URL
	2. Enter gateway token (default: `huggingclaw`)
	3. Select "Llama 3.3 70B (Versatile)" from model dropdown
	4. Experience blazing fast responses! ⚡

	---

	## Available Models (Verified 2025)

	### Chat Models

	\| Model ID \| Name \| Context \| Speed \| Best For \|
	\|----------\|------\|---------\|-------\|----------\|
	\| `llama-3.3-70b-versatile` \| Llama 3.3 70B \| 128K \| ⚡⚡⚡⚡ \| Best overall \|
	\| `llama-3.1-8b-instant` \| Llama 3.1 8B \| 128K \| ⚡⚡⚡⚡⚡ \| Ultra-fast \|
	\| `meta-llama/llama-4-maverick-17b-128e-instruct` \| Llama 4 Maverick \| 128K \| ⚡⚡⚡⚡ \| Latest Llama 4 \|
	\| `meta-llama/llama-4-scout-17b-16e-instruct` \| Llama 4 Scout \| 128K \| ⚡⚡⚡⚡ \| Latest Llama 4 \|
	\| `qwen/qwen3-32b` \| Qwen3 32B \| 128K \| ⚡⚡⚡ \| Alibaba model \|
	\| `moonshotai/kimi-k2-instruct` \| Kimi K2 \| 128K \| ⚡⚡⚡ \| Moonshot AI \|
	\| `openai/gpt-oss-20b` \| GPT-OSS 20B \| 128K \| ⚡⚡⚡ \| OpenAI open-source \|
	\| `allam-2-7b` \| Allam-2 7B \| 4K \| ⚡⚡⚡⚡ \| Arabic/English \|

	### Audio Models

	\| Model ID \| Name \| Purpose \|
	\|----------\|------\|---------\|
	\| `whisper-large-v3-turbo` \| Whisper Large V3 Turbo \| Speech-to-text \|
	\| `whisper-large-v3` \| Whisper Large V3 \| Speech-to-text \|

	### Safety Models

	\| Model ID \| Name \| Purpose \|
	\|----------\|------\|---------\|
	\| `meta-llama/llama-guard-4-12b` \| Llama Guard 4 \| Content moderation \|
	\| `meta-llama/llama-prompt-guard-2-86m` \| Llama Prompt Guard 2 \| Prompt injection detection \|

	---

	## Configuration Options

	### Basic Setup (Recommended)

	```bash
	GROQ_API_KEY=gsk_xxxxx
	OPENCLAW_DEFAULT_MODEL=groq/llama-3.3-70b-versatile
	```

	### Multiple Providers

	Use Groq as primary with fallbacks:

	```bash
	# Groq (primary - fastest)
	GROQ_API_KEY=gsk_xxxxx

	# OpenRouter (fallback - more models)
	OPENROUTER_API_KEY=sk-or-v1-xxxxx

	# Local Ollama (free backup)
	LOCAL_MODEL_ENABLED=true
	LOCAL_MODEL_NAME=neuralnexuslab/hacking
	```

	Priority order:
	1. Groq (if `GROQ_API_KEY` set) ← Fastest!
	2. xAI (if `XAI_API_KEY` set)
	3. OpenAI (if `OPENAI_API_KEY` set)
	4. OpenRouter (if `OPENROUTER_API_KEY` set)
	5. Local (if `LOCAL_MODEL_ENABLED=true`)

	---

	## Model Recommendations

	### Best for General Use
	```bash
	OPENCLAW_DEFAULT_MODEL=groq/llama-3.3-70b-versatile
	```
	- Excellent quality
	- 128K context window
	- Fast (500+ tokens/s)

	### Fastest Responses
	```bash
	OPENCLAW_DEFAULT_MODEL=groq/llama-3.1-8b-instant
	```
	- Instant responses
	- Good for simple Q&A
	- Highest rate limits

	### Latest & Greatest
	```bash
	OPENCLAW_DEFAULT_MODEL=meta-llama/llama-4-maverick-17b-128e-instruct
	```
	- Llama 4 architecture
	- Best reasoning
	- cutting-edge performance

	### Long Documents
	```bash
	OPENCLAW_DEFAULT_MODEL=groq/llama-3.3-70b-versatile
	```
	- 128K context window
	- Can process entire books
	- Excellent summarization

	---

	## Pricing

	### Free Tier (Generous!)

	\| Model \| Rate Limit \|
	\|-------\|-----------\|
	\| Llama 3.1 8B \| ~30 req/min \|
	\| Llama 3.3 70B \| ~30 req/min \|
	\| Llama 4 Maverick \| ~30 req/min \|
	\| Llama 4 Scout \| ~30 req/min \|
	\| Qwen3 32B \| ~30 req/min \|
	\| Kimi K2 \| ~30 req/min \|

	Perfect for personal bots! Most users never need paid tier.

	### Paid Plans

	Check https://groq.com/pricing for enterprise pricing.

	---

	## Performance Comparison

	\| Provider \| Tokens/sec \| Latency \| Cost \|
	\|----------\|-----------\|---------\|------\|
	\| Groq Llama 3.3 \| 500+ \| <100ms \| Free \|
	\| Groq Llama 4 \| 400+ \| <150ms \| Free \|
	\| xAI Grok \| 100-200 \| 200-500ms \| $ \|
	\| OpenAI GPT-4 \| 50-100 \| 500ms-1s \| $$$ \|
	\| Local Ollama \| 20-50 \| 100-200ms \| Free \|

	---

	## Troubleshooting

	### "Invalid API key"

	1. Verify key starts with `gsk_`
	2. No spaces or newlines
	3. Check key at https://console.groq.com/api-keys
	4. Regenerate if compromised

	### "Rate limit exceeded"

	- Free tier: ~30 requests/minute
	- Use `llama-3.1-8b-instant` for higher limits
	- Add delays between requests
	- Consider paid plan for heavy usage

	### "Model not found"

	- Use exact model ID from table above
	- Check model is active in Groq console
	- Some models may be region-restricted

	### Slow Responses

	- Groq should be <100ms
	- Check internet connection
	- HF Spaces region matters (US = fastest)

	---

	## Example: WhatsApp Bot with Groq

	```bash
	# HF Spaces secrets
	GROQ_API_KEY=gsk_xxxxx
	HF_TOKEN=hf_xxxxx
	AUTO_CREATE_DATASET=true

	# WhatsApp (configure in Control UI)
	WHATSAPP_PHONE=+1234567890
	WHATSAPP_CODE=ABC123
	```

	Result: Ultra-fast WhatsApp AI bot! ⚡

	---

	## API Reference

	### Test Your Key

	```bash
	curl https://api.groq.com/openai/v1/models \
	-H "Authorization: Bearer gsk_xxxxx"
	```

	### Chat Completion

	```bash
	curl https://api.groq.com/openai/v1/chat/completions \
	-H "Content-Type: application/json" \
	-H "Authorization: Bearer gsk_xxxxx" \
	-d '{
	"model": "llama-3.3-70b-versatile",
	"messages": [
	{"role": "user", "content": "Hello!"}
	]
	}'
	```

	---

	## Best Practices

	### 1. Choose Right Model

	- Chat: `llama-3.3-70b-versatile`
	- Fast Q&A: `llama-3.1-8b-instant`
	- Complex tasks: `meta-llama/llama-4-maverick-17b-128e-instruct`
	- Long docs: `llama-3.3-70b-versatile` (128K context)

	### 2. Monitor Usage

	Check https://console.groq.com/usage

	### 3. Secure Your Key

	- Never commit to git
	- Use HF Spaces secrets
	- Rotate keys periodically

	### 4. Set Up Alerts

	Configure usage alerts in Groq console.

	---

	## Next Steps

	1. ✅ Get API key from https://console.groq.com
	2. ✅ Set `GROQ_API_KEY` in HF Spaces secrets
	3. ✅ Deploy and test in Control UI
	4. ✅ Configure WhatsApp/Telegram channels
	5. 🎉 Enjoy sub-second AI responses!

	---

	## Speed Test

	After setup, test Groq's speed:

	```
	1. Open Control UI
	2. Select "Llama 3.3 70B (Versatile)"
	3. Send: "Write a 100-word story about a robot"
	4. Watch it generate in <0.5 seconds! ⚡⚡⚡
	```

	---

	## Support

	- Groq Docs: https://console.groq.com/docs
	- API Status: https://status.groq.com
	- HuggingClaw: https://github.com/openclaw/openclaw/issues

	---

	## Available via OpenAI-Compatible API

	All Groq models work via OpenAI-compatible endpoint:

	```bash
	OPENAI_API_KEY=gsk_xxxxx
	OPENAI_BASE_URL=https://api.groq.com/openai/v1
	```

	This allows using Groq with any OpenAI-compatible client!