Minibase
/

Detoxify-Language-Small

@@ -1,61 +1,65 @@
----
-license: mit
-language:
-- en
-pipeline_tag: text-generation
-tags:
-- detoxify
-- nano
-- small
-- vulgar
-- curse
----
-# Detoxify-Language-Small (GGUF, Q8_0)
-**TL;DR**: A compact detoxification model in **GGUF (Q8_0)** format for fast CPU inference via `llama.cpp` and compatible runtimes. File size: ~138.1 MiB.
-## Files
-- `small-base_Detoxify-Small_high_Q8_0.gguf` (SHA256: `98945b1291812eb85275fbf2bf60ff92522e7b80026c8301ff43127fdd52826e`; size: 144810464 bytes)
-## Intended use
-- **Task**: detoxification of text, without changing the context of that text.
-- **Hardware**: laptops/CPUs via `llama.cpp`; small GPUs with GGUF loaders.
-- **Not for**: safety-critical or clinical use.
-## How to run (llama.cpp)
-> Replace the `-p` prompt with your own text. For classification, you can use a simple prompt like:
-> `"Classify the following text as TOXIC or NON-TOXIC: <text>"`
 ```bash
-# Build llama.cpp once (see upstream instructions), then:
-./main -m small-base_Detoxify-Small_high_Q8_0.gguf -p "Classify the following text as TOXIC or NON-TOXIC: I hate you."
-```
-If your downstream workflow expects logits/labels directly, consider adapting a small wrapper that maps generated tokens to labels (example Python script to be added).
-## Model details
-- **Format**: GGUF (quantized: **Q8_0**)
-- **Architecture**: LlamaForCausalLM
-- **Tokenizer**: (embedded in GGUF; if you use a custom tokenizer, document it here)
-- **Context length**: (not explicitly extracted here; typical small models use 2048–4096 — fill if known)
-- **Base model / provenance**: Fine-tuned from the Minibase Small Base model at minibase.ai.
-> If you can share the base model and training data (even briefly), add a short bullet list here to improve discoverability.
-## Training Data
-- Toxicity detection can reflect dataset and annotation biases. Use with caution, especially on dialects and minority language varieties.
-- Performance in languages other than English is likely reduced unless trained multi-lingually.
-## Limitations & bias
-- Toxicity detection can reflect dataset and annotation biases. Use with caution, especially on dialects and minority language varieties.
-- Performance in languages other than English is likely reduced unless trained multi-lingually.
-## License
-- **MIT**
-## Checksums
-- `small-base_Detoxify-Small_high_Q8_0.gguf` — `SHA256: 98945b1291812eb85275fbf2bf60ff92522e7b80026c8301ff43127fdd52826e`
-## Changelog
-- Initial upload.

+# Detoxify-Small - GGUF Model Package
+This package contains a GGUF (GPT-Generated Unified Format) model file and all necessary configuration files to run the model locally.
+## Model Information
+- **Model Name**: Detoxify-Small
+- **Base Model**:
+- **Architecture**: LlamaForCausalLM
+- **Context Window**: 1024 tokens
+- **Format**: GGUF (optimized for local inference)
+## Files Included
+- `model.gguf` - The quantized model file
+- `inference.lock.json` - Server configuration
+- `model_info.json` - Model metadata
+- `run_server.sh` - Script to start the inference server
+- `README.md` - This file
+- `USAGE.md` - Usage examples and instructions
+## Quick Start
+1. Make sure you have [llama.cpp](https://github.com/ggerganov/llama.cpp) installed
+2. Run the provided script:
+   ```bash
+   ./run_server.sh
+   ```
+3. The server will start on http://127.0.0.1:8000
+## Manual Setup
+If you prefer to run manually:
 ```bash
+# Start the server
+llama-server \
+  -m model.gguf \
+  --host 127.0.0.1 \
+  --port 8000 \
+  --n-gpu-layers 0 \
+  --chat-template ""```
+## API Usage
+Once the server is running, you can make requests to:
+- **Health Check**: `GET http://127.0.0.1:8000/health`
+- **Completion**: `POST http://127.0.0.1:8000/completion`
+- **Tokenization**: `POST http://127.0.0.1:8000/tokenize`
+## Requirements
+- llama.cpp (latest version recommended)
+- At least 8GB RAM (16GB recommended)
+- For GPU acceleration: Metal (macOS), CUDA (Linux/Windows), or Vulkan
+## Troubleshooting
+- If you get memory errors, reduce `--n-gpu-layers` or use a smaller model
+- For slower machines, try `--ctx-size 2048` to reduce context window
+- Check `USAGE.md` for detailed examples and troubleshooting tips
+---
+Generated on 2025-09-17 20:07:11