---
title: InferenceProxy
emoji: 💾
colorFrom: blue
colorTo: pink
sdk: docker
pinned: false
app_port: 4040
---

# inference-proxy

Lightweight proxy to store LLM traces in a Hugging Face Dataset.

### How it works

This API acts as a proxy for OpenAPI endpoints. You can specify a couple of variables:

- `BATCH_SIZE_LIMIT` - the maximum batch size before pushing to dataset
- `BATCH_TIME_LIMIT` - the amount of time before pushing to dataset

### Required Environment Variables

- `HF_ACCESS_TOKEN` - HF Access Token
- `USER_NAME` - Used to ensure we only process requests from the user

### Example

```js
import { OpenAI } from "openai";

const client = new OpenAI({
	baseURL: "http://localhost:4040/fireworks-ai/inference/v1", 
	apiKey: process.env.HF_API_KEY,
});

let out = "";

const stream = await client.chat.completions.create({
    model: "accounts/fireworks/models/deepseek-v3",
    messages: [
        {
            role: "user",
            content: "What is the capital of France?",
        },
    ],
    stream: true,
    max_tokens: 500,
});

for await (const chunk of stream) {
	if (chunk.choices && chunk.choices.length > 0) {
		const newContent = chunk.choices[0].delta.content;
		out += newContent;
		console.log(newContent);
	}  
}
```