Cialtion commited on
Commit
d7e2b17
·
verified ·
1 Parent(s): c575922
Files changed (1) hide show
  1. README.md +36 -0
README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SimpleTool: Parallel Decoding for Real-Time LLM Function Calling
2
+
3
+ [**Hugging Face**](https://huggingface.co/Cialtion/SimpleTool) | [**ModelScope**](https://www.modelscope.cn/models/cialtion/SimpleTool) | [**GitHub**](https://github.com/HaxxorCialtion/SimpleTool)
4
+
5
+ This repository contains the weights for **RT-Qwen** (RealtimeTool), a series of models optimized for low-latency, parallel LLM function calling.
6
+
7
+ ## 📁 Model Directory Structure
8
+
9
+ The models are organized by scale, quantization format, and inference framework.
10
+
11
+ ### 1. SFT & AWQ Models (vLLM / Transformers)
12
+ Directly use these folders for inference via `vLLM` or `Transformers`.
13
+ * **RT-Qwen2.5-0.5B** / **-0.5B-AWQ**
14
+ * **RT-Qwen2.5-1.5B** / **-1.5B-AWQ**
15
+ * **RT-Qwen2.5-3B** / **-3B-AWQ**
16
+ * **RT-Qwen2.5-7B** / **-7B-AWQ**
17
+ * **RT-Qwen2.5-14B** / **-14B-AWQ**
18
+ * **RT-Qwen3-4B** / **-4B-AWQ**
19
+ * **RT-Qwen3-30B** / **-30B-AWQ**
20
+
21
+ ### 2. GGUF Models (llama.cpp)
22
+ * **`gguf_models/`**: Full-precision (F16) GGUF files for all versions.
23
+ * **`gguf_quantized/`**: Quantized GGUF versions including **Q4_K_M**, **Q5_K_M**, and **Q8_0**.
24
+
25
+ ---
26
+
27
+ ## 📝 TODO
28
+
29
+ - [ ] Release Arxiv Paper
30
+ - [ ] Complete GitHub Documentation
31
+ - [ ] Add Performance Benchmarks
32
+ - [ ] Provide Citation Info
33
+
34
+ ---
35
+ **License**: Apache-2.0
36
+ **Status**: Models Uploading / Placeholder README