---
license: apache-2.0
language: zh
tags:
- transformer
- t5
- text2text-generation
- chinese
- multitask
- tokenizer
---

# Randeng-T5-784M-MultiTask-Chinese-with-Tokenizer-JSON

This repository hosts a modified version of the [IDEA-CCNL/Randeng-T5-784M-MultiTask-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-T5-784M-MultiTask-Chinese) model. The primary purpose of this repository is to **include the `tokenizer.json` file**, which was missing in the original release.

## Motivation for this Repository

The original `IDEA-CCNL/Randeng-T5-784M-MultiTask-Chinese` model is an excellent T5-based model for various Chinese NLP tasks. However, it was released with only a `spiece.model` file for its tokenizer, lacking the `tokenizer.json` file.

While the Python `transformers` library can generally load the tokenizer from `spiece.model`, this absence caused issues for environments that strictly prefer or require `tokenizer.json` (e.g., certain versions or implementations of the Rust `tokenizers` library, or other frameworks that rely on this standardized format).

To enhance usability and compatibility across different platforms and libraries, this repository was created to provide the model with the commonly expected `tokenizer.json` file.

## Changes Made

The following modifications have been made to the original `IDEA-CCNL/Randeng-T5-784M-MultiTask-Chinese` model files:

* **Added `tokenizer.json`:** The primary change is the inclusion of the `tokenizer.json` file, generated from the original `spiece.model` using the Python `transformers` library's `save_pretrained()` method. This ensures broader compatibility and easier loading for various applications.
* **No Model Weights Changes:** **Crucially, the model weights (`pytorch_model.bin` or `model.safetensors`) themselves have not been altered in any way.** This repository provides the exact same powerful pre-trained model, just with an updated tokenizer serialization format.

## How to Use

You can load this model and its tokenizer using the Hugging Face `transformers` library:

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "your-username/Randeng-T5-784M-MultiTask-Chinese-with-Tokenizer-JSON" # Replace with your actual repository name

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

text = "你好，这是一个测试。"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
````

For Rust users (and others requiring `tokenizer.json`):

```rust
use tokenizers::Tokenizer;
use std::error::Error;

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let model_id = "your-username/Randeng-T5-784M-MultiTask-Chinese-with-Tokenizer-JSON"; // Replace with your actual repository name
    
    // The Tokenizer::from_pretrained will now find and use tokenizer.json
    let tokenizer = Tokenizer::from_pretrained(model_id, None).await?; 

    let text = "你好，这是一个中文文本。";
    let encoding = tokenizer.encode(text, true).unwrap();

    println!("Original text: {}", text);
    println!("Tokens: {:?}", encoding.get_tokens());
    println!("IDs: {:?}", encoding.get_ids());

    let decoded_text = tokenizer.decode(encoding.get_ids(), true).unwrap();
    println!("Decoded text: {}", decoded_text);

    Ok(())
}
```

## Original Model Information

For more details about the original `IDEA-CCNL/Randeng-T5-784M-MultiTask-Chinese` model, its training, capabilities, and benchmarks, please refer to its official repository: [IDEA-CCNL/Randeng-T5-784M-MultiTask-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-T5-784M-MultiTask-Chinese).