Cialtion
/

SimpleTool

Text Generation

parallel-decoding

Model card Files Files and versions

Cialtion commited on Feb 3

Commit

d7e2b17

·

verified ·

1 Parent(s): c575922

readme

Files changed (1) hide show

README.md +36 -0

README.md ADDED Viewed

	@@ -0,0 +1,36 @@

+# SimpleTool: Parallel Decoding for Real-Time LLM Function Calling
+[**Hugging Face**](https://huggingface.co/Cialtion/SimpleTool) | [**ModelScope**](https://www.modelscope.cn/models/cialtion/SimpleTool) | [**GitHub**](https://github.com/HaxxorCialtion/SimpleTool)
+This repository contains the weights for **RT-Qwen** (RealtimeTool), a series of models optimized for low-latency, parallel LLM function calling.
+## 📁 Model Directory Structure
+The models are organized by scale, quantization format, and inference framework.
+### 1. SFT & AWQ Models (vLLM / Transformers)
+Directly use these folders for inference via `vLLM` or `Transformers`.
+*   **RT-Qwen2.5-0.5B** / **-0.5B-AWQ**
+*   **RT-Qwen2.5-1.5B** / **-1.5B-AWQ**
+*   **RT-Qwen2.5-3B** / **-3B-AWQ**
+*   **RT-Qwen2.5-7B** / **-7B-AWQ**
+*   **RT-Qwen2.5-14B** / **-14B-AWQ**
+*   **RT-Qwen3-4B** / **-4B-AWQ**
+*   **RT-Qwen3-30B** / **-30B-AWQ**
+### 2. GGUF Models (llama.cpp)
+*   **`gguf_models/`**: Full-precision (F16) GGUF files for all versions.
+*   **`gguf_quantized/`**: Quantized GGUF versions including **Q4_K_M**, **Q5_K_M**, and **Q8_0**.
+---
+## 📝 TODO
+- [ ] Release Arxiv Paper
+- [ ] Complete GitHub Documentation
+- [ ] Add Performance Benchmarks
+- [ ] Provide Citation Info
+---
+**License**: Apache-2.0
+**Status**: Models Uploading / Placeholder README