Minibase commited on
Commit
cdea8ff
·
verified ·
1 Parent(s): 3453a48

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +50 -46
README.md CHANGED
@@ -1,61 +1,65 @@
1
- ---
2
- license: mit
3
- language:
4
- - en
5
- pipeline_tag: text-generation
6
- tags:
7
- - detoxify
8
- - nano
9
- - small
10
- - vulgar
11
- - curse
12
- ---
13
 
14
- # Detoxify-Language-Small (GGUF, Q8_0)
 
 
 
 
 
 
15
 
16
- **TL;DR**: A compact detoxification model in **GGUF (Q8_0)** format for fast CPU inference via `llama.cpp` and compatible runtimes. File size: ~138.1 MiB.
17
 
18
- ## Files
19
- - `small-base_Detoxify-Small_high_Q8_0.gguf` (SHA256: `98945b1291812eb85275fbf2bf60ff92522e7b80026c8301ff43127fdd52826e`; size: 144810464 bytes)
 
 
 
 
20
 
21
- ## Intended use
22
- - **Task**: detoxification of text, without changing the context of that text.
23
- - **Hardware**: laptops/CPUs via `llama.cpp`; small GPUs with GGUF loaders.
24
- - **Not for**: safety-critical or clinical use.
25
 
26
- ## How to run (llama.cpp)
27
- > Replace the `-p` prompt with your own text. For classification, you can use a simple prompt like:
28
- > `"Classify the following text as TOXIC or NON-TOXIC: <text>"`
 
 
 
 
 
 
 
29
 
30
  ```bash
31
- # Build llama.cpp once (see upstream instructions), then:
32
- ./main -m small-base_Detoxify-Small_high_Q8_0.gguf -p "Classify the following text as TOXIC or NON-TOXIC: I hate you."
33
- ```
 
 
 
 
34
 
35
- If your downstream workflow expects logits/labels directly, consider adapting a small wrapper that maps generated tokens to labels (example Python script to be added).
36
 
37
- ## Model details
38
- - **Format**: GGUF (quantized: **Q8_0**)
39
- - **Architecture**: LlamaForCausalLM
40
- - **Tokenizer**: (embedded in GGUF; if you use a custom tokenizer, document it here)
41
- - **Context length**: (not explicitly extracted here; typical small models use 2048–4096 — fill if known)
42
- - **Base model / provenance**: Fine-tuned from the Minibase Small Base model at minibase.ai.
43
 
44
- > If you can share the base model and training data (even briefly), add a short bullet list here to improve discoverability.
 
 
45
 
46
- ## Training Data
47
- - Toxicity detection can reflect dataset and annotation biases. Use with caution, especially on dialects and minority language varieties.
48
- - Performance in languages other than English is likely reduced unless trained multi-lingually.
49
 
50
- ## Limitations & bias
51
- - Toxicity detection can reflect dataset and annotation biases. Use with caution, especially on dialects and minority language varieties.
52
- - Performance in languages other than English is likely reduced unless trained multi-lingually.
53
 
54
- ## License
55
- - **MIT**
56
 
57
- ## Checksums
58
- - `small-base_Detoxify-Small_high_Q8_0.gguf` `SHA256: 98945b1291812eb85275fbf2bf60ff92522e7b80026c8301ff43127fdd52826e`
 
59
 
60
- ## Changelog
61
- - Initial upload.
 
1
+ # Detoxify-Small - GGUF Model Package
2
+
3
+ This package contains a GGUF (GPT-Generated Unified Format) model file and all necessary configuration files to run the model locally.
 
 
 
 
 
 
 
 
 
4
 
5
+ ## Model Information
6
+
7
+ - **Model Name**: Detoxify-Small
8
+ - **Base Model**:
9
+ - **Architecture**: LlamaForCausalLM
10
+ - **Context Window**: 1024 tokens
11
+ - **Format**: GGUF (optimized for local inference)
12
 
13
+ ## Files Included
14
 
15
+ - `model.gguf` - The quantized model file
16
+ - `inference.lock.json` - Server configuration
17
+ - `model_info.json` - Model metadata
18
+ - `run_server.sh` - Script to start the inference server
19
+ - `README.md` - This file
20
+ - `USAGE.md` - Usage examples and instructions
21
 
22
+ ## Quick Start
 
 
 
23
 
24
+ 1. Make sure you have [llama.cpp](https://github.com/ggerganov/llama.cpp) installed
25
+ 2. Run the provided script:
26
+ ```bash
27
+ ./run_server.sh
28
+ ```
29
+ 3. The server will start on http://127.0.0.1:8000
30
+
31
+ ## Manual Setup
32
+
33
+ If you prefer to run manually:
34
 
35
  ```bash
36
+ # Start the server
37
+ llama-server \
38
+ -m model.gguf \
39
+ --host 127.0.0.1 \
40
+ --port 8000 \
41
+ --n-gpu-layers 0 \
42
+ --chat-template ""```
43
 
44
+ ## API Usage
45
 
46
+ Once the server is running, you can make requests to:
 
 
 
 
 
47
 
48
+ - **Health Check**: `GET http://127.0.0.1:8000/health`
49
+ - **Completion**: `POST http://127.0.0.1:8000/completion`
50
+ - **Tokenization**: `POST http://127.0.0.1:8000/tokenize`
51
 
52
+ ## Requirements
 
 
53
 
54
+ - llama.cpp (latest version recommended)
55
+ - At least 8GB RAM (16GB recommended)
56
+ - For GPU acceleration: Metal (macOS), CUDA (Linux/Windows), or Vulkan
57
 
58
+ ## Troubleshooting
 
59
 
60
+ - If you get memory errors, reduce `--n-gpu-layers` or use a smaller model
61
+ - For slower machines, try `--ctx-size 2048` to reduce context window
62
+ - Check `USAGE.md` for detailed examples and troubleshooting tips
63
 
64
+ ---
65
+ Generated on 2025-09-17 20:07:11