| --- |
| license: mit |
| library_name: transformers |
| pipeline_tag: text-generation |
| tags: |
| - tensormind |
| - causal-lm |
| - text-generation |
| - chinese |
| - custom-code |
| language: |
| - zh |
| - en |
| model-index: |
| - name: TensorMind |
| results: |
| - task: |
| type: text-generation |
| name: Chinese Multiple-Choice Evaluation |
| dataset: |
| type: custom |
| name: C-Eval |
| metrics: |
| - type: accuracy |
| value: 27.27 |
| name: C-Eval (0-shot) |
| - task: |
| type: text-generation |
| name: Chinese Multiple-Choice Evaluation |
| dataset: |
| type: custom |
| name: CMMLU |
| metrics: |
| - type: accuracy |
| value: 25.26 |
| name: CMMLU (0-shot) |
| - task: |
| type: text-generation |
| name: Chinese Multiple-Choice Evaluation |
| dataset: |
| type: custom |
| name: A-CLUE |
| metrics: |
| - type: accuracy |
| value: 25.43 |
| name: A-CLUE (0-shot) |
| - task: |
| type: text-generation |
| name: Chinese Multiple-Choice Evaluation |
| dataset: |
| type: custom |
| name: TMMLU+ |
| metrics: |
| - type: accuracy |
| value: 24.96 |
| name: TMMLU+ (0-shot) |
| --- |
| |
| # TensorMind (0.5B) |
|
|
| TensorMind is a 536.9M-parameter causal language model for lightweight Chinese/English text generation. |
|
|
| ## Model Details |
|
|
| - Architecture: Decoder-only Transformer (`TensorMindForCausalLM`) |
| - Layers: 32 |
| - Hidden size: 1024 |
| - Heads / KV heads: 16 / 8 (GQA) |
| - Context length: 32,768 |
| - Vocab size: 32,768 |
| - Positional encoding: RoPE |
| - Activation: SiLU |
| - Parameters: 536,941,568 (~0.5B) |
|
|
| ## Quick Start |
|
|
| ```python |
| import torch |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| repo_id = "TensorMind/TensorMind" |
| tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| repo_id, |
| trust_remote_code=True, |
| torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32, |
| ) |
| |
| prompt = "请用三句话介绍一下你自己。" |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| outputs = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| ## Benchmark Snapshot |
|
|
| Evaluation time: 2026-03-07 00:40 (UTC+8), zero-shot (`n-shot=0`). |
|
|
| | Model | Params | C-Eval | CMMLU | A-CLUE | TMMLU+ | AGIEval | |
| |---|---:|---:|---:|---:|---:|---:| |
| | TensorMind | 0.5B | 27.27 | 25.26 | 25.43 | 24.96 | 33.56 | |
|
|
|
|
|  |
|
|
|
|
|  |
|
|
| ## Intended Use |
|
|
| - Lightweight chat and text generation |
| - Local experimentation and teaching |
| - Baseline model for research and fine-tuning |
|
|
| ## Limitations |
|
|
| - This is a small model and can produce factual errors. |
| - Benchmark numbers above are from multiple-choice style evaluations and do not fully represent open-ended generation quality. |
| - Outputs may contain bias or unsafe content; apply filtering for production use. |
|
|
| ## License |
|
|
| MIT License. |
|
|