File size: 1,467 Bytes
b93353e
6558529
 
 
 
b93353e
 
6558529
 
b93353e
 
6558529
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
title: OpenCode Hub
emoji: πŸ€–
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
license: mit
short_description: OpenCode AI coding agent with AirLLM + ChromaDB + turbo
---

# OpenCode Hub β€” HF Space

Open-source AI coding agent with memory-optimized inference.

## Features

- **AirLLM** β€” Run 70B models on 4GB GPU via layer-by-layer loading  
- **ChromaDB** β€” Vector store for RAG (retrieval-augmented generation)  
- **turbo (turbopuffer)** β€” High-performance vector search index  
- **OpenCode** β€” Full open-source AI coding agent API  
- **FastAPI** β€” REST API compatible with the Replit OpenCode Hub frontend

## Models Supported

- `meta-llama/Meta-Llama-3-70B-Instruct` (4GB VRAM via AirLLM)
- `Qwen/Qwen2.5-72B-Instruct`
- `mistralai/Mistral-7B-Instruct-v0.3`
- Any HuggingFace model

## API Endpoints

```
GET  /health          β€” Health check
GET  /models          β€” List available models
POST /generate        β€” Generate text with AirLLM
POST /embed           β€” Generate embeddings
GET  /collections     β€” List ChromaDB collections
POST /collections/{n}/search β€” Semantic search
POST /collections/{n}/add    β€” Add documents
GET  /stats           β€” Memory and performance stats
```

## Environment Variables

- `HF_TOKEN` β€” Hugging Face access token (auto-configured)
- `MODEL_ID` β€” Default model (default: `meta-llama/Meta-Llama-3-70B-Instruct`)
- `MAX_GPU_MEMORY_GB` β€” GPU memory limit in GB (default: `4`)