File size: 2,979 Bytes
69262b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
license: mit
library_name: transformers
pipeline_tag: text-generation
tags:
- tensormind
- causal-lm
- text-generation
- chinese
- custom-code
language:
- zh
- en
model-index:
- name: TensorMind
  results:
  - task:
      type: text-generation
      name: Chinese Multiple-Choice Evaluation
    dataset:
      type: custom
      name: C-Eval
    metrics:
    - type: accuracy
      value: 27.27
      name: C-Eval (0-shot)
  - task:
      type: text-generation
      name: Chinese Multiple-Choice Evaluation
    dataset:
      type: custom
      name: CMMLU
    metrics:
    - type: accuracy
      value: 25.26
      name: CMMLU (0-shot)
  - task:
      type: text-generation
      name: Chinese Multiple-Choice Evaluation
    dataset:
      type: custom
      name: A-CLUE
    metrics:
    - type: accuracy
      value: 25.43
      name: A-CLUE (0-shot)
  - task:
      type: text-generation
      name: Chinese Multiple-Choice Evaluation
    dataset:
      type: custom
      name: TMMLU+
    metrics:
    - type: accuracy
      value: 24.96
      name: TMMLU+ (0-shot)
---

# TensorMind (0.5B)

TensorMind is a 536.9M-parameter causal language model for lightweight Chinese/English text generation.

## Model Details

- Architecture: Decoder-only Transformer (`TensorMindForCausalLM`)
- Layers: 32
- Hidden size: 1024
- Heads / KV heads: 16 / 8 (GQA)
- Context length: 32,768
- Vocab size: 32,768
- Positional encoding: RoPE
- Activation: SiLU
- Parameters: 536,941,568 (~0.5B)

## Quick Start

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

repo_id = "TensorMind/TensorMind"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    trust_remote_code=True,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
)

prompt = "请用三句话介绍一下你自己。"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Benchmark Snapshot

Evaluation time: 2026-03-07 00:40 (UTC+8), zero-shot (`n-shot=0`).

| Model | Params | C-Eval | CMMLU | A-CLUE | TMMLU+ | AGIEval |
|---|---:|---:|---:|---:|---:|---:|
| TensorMind | 0.5B | 27.27 | 25.26 | 25.43 | 24.96 | 33.56 |


![TensorMind benchmark table](./assets/compare_table_tensormind.png)


![TensorMind benchmark radar](./assets/compare_radar_tensormind.png)

## Intended Use

- Lightweight chat and text generation
- Local experimentation and teaching
- Baseline model for research and fine-tuning

## Limitations

- This is a small model and can produce factual errors.
- Benchmark numbers above are from multiple-choice style evaluations and do not fully represent open-ended generation quality.
- Outputs may contain bias or unsafe content; apply filtering for production use.

## License

MIT License.