--- title: InferenceProxy emoji: 💾 colorFrom: blue colorTo: pink sdk: docker pinned: false app_port: 4040 --- # inference-proxy Lightweight proxy to store LLM traces in a Hugging Face Dataset. ### How it works This API acts as a proxy for OpenAPI endpoints. You can specify a couple of variables: - `BATCH_SIZE_LIMIT` - the maximum batch size before pushing to dataset - `BATCH_TIME_LIMIT` - the amount of time before pushing to dataset ### Required Environment Variables - `HF_ACCESS_TOKEN` - HF Access Token - `USER_NAME` - Used to ensure we only process requests from the user ### Example ```js import { OpenAI } from "openai"; const client = new OpenAI({ baseURL: "http://localhost:4040/fireworks-ai/inference/v1", apiKey: process.env.HF_API_KEY, }); let out = ""; const stream = await client.chat.completions.create({ model: "accounts/fireworks/models/deepseek-v3", messages: [ { role: "user", content: "What is the capital of France?", }, ], stream: true, max_tokens: 500, }); for await (const chunk of stream) { if (chunk.choices && chunk.choices.length > 0) { const newContent = chunk.choices[0].delta.content; out += newContent; console.log(newContent); } } ```