telecomadm1145 commited on
Commit
998986a
·
verified ·
1 Parent(s): 3781b12

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +130 -0
README.md ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - zh
4
+ library_name: transformers
5
+ pipeline_tag: text-generation
6
+ license: mit
7
+ datasets:
8
+ - telecomadm1145/esjzone_novel_cn
9
+ tags:
10
+ - mamba2
11
+ ---
12
+
13
+ # mamba2_exp3
14
+
15
+ <!-- Provide a quick summary of what the model is/does. -->
16
+
17
+ **mamba2_exp3** is a **Mamba2** architecture model with approximately **0.4 Billion parameters**. It has been pre-trained on a dataset of Chinese light novels (esjzone). It is intended for text generation and story continuation tasks in Chinese.
18
+
19
+ ## Model Details
20
+
21
+ ### Model Description
22
+
23
+ This model utilizes the Mamba2 state-space model architecture, designed for efficient inference. It was pre-trained from scratch on a corpus of uncleaned Chinese light novels.
24
+
25
+ **Note:** This is a **base model** (pre-trained only), meaning it has **not** undergone instruction tuning (RLHF or SFT). It is best suited for completing text based on a prompt (continuation) rather than answering questions or following complex instructions.
26
+
27
+ - **Developed by:** telecomadm1145
28
+ - **Model type:** Mamba2 (State Space Model)
29
+ - **Language(s) (NLP):** Chinese (zh)
30
+ - **License:** MIT
31
+ - **Finetuned from model:** None (Trained from scratch)
32
+ - **Model Size:** ~0.4B parameters
33
+ - **Context Length:** 1024 tokens
34
+
35
+ ### Model Sources
36
+
37
+ - **Repository:** [https://huggingface.co/telecomadm1145/mamba2_exp2](https://huggingface.co/telecomadm1145/mamba2_exp2)
38
+ - **Dataset:** [telecomadm1145/esjzone_novel_cn](https://huggingface.co/datasets/telecomadm1145/esjzone_novel_cn)
39
+
40
+ ## Uses
41
+
42
+ ### Direct Use
43
+
44
+ The model is designed for:
45
+ - **Creative Writing:** Generating light novel-style stories.
46
+ - **Text Completion:** Continuing a given text narrative in Chinese.
47
+ - **Style Imitation:** Mimicking the tropes and writing styles found in web novels.
48
+
49
+ ### Out-of-Scope Use
50
+
51
+ - **Factual Question Answering:** Since it is trained on fiction, it will likely hallucinate facts.
52
+ - **Instruction Following:** It has not been fine-tuned to follow commands (e.g., "Write a summary of...").
53
+ - **Code Generation:** Not trained on code.
54
+ - **Long-context retrieval:** The model was trained with a context window of 1024 tokens; performance may degrade significantly beyond this length.
55
+
56
+ ## Bias, Risks, and Limitations
57
+
58
+ - **Dataset Quality:** The training data consists of **uncleaned** web novels. Consequently, the model may generate text containing typos, grammatical errors, or non-standard formatting present in the source material.
59
+ - **Content Warnings:** The model may generate content that includes violence, mature themes, or offensive language, reflecting the nature of some web fiction genres.
60
+ - **Hallucinations:** As a fiction-focused model, it creates content and should not be used as a knowledge base.
61
+
62
+ ## How to Get Started with the Model
63
+
64
+ Use the code below to get started with the model.
65
+
66
+ **Note:** You may need to install `mamba-ssm` and `causal-conv1d` depending on the environment configuration for Mamba2 models.
67
+
68
+ ```python
69
+ from transformers import AutoTokenizer, AutoModelForCausalLM
70
+ import torch
71
+
72
+ # Load model and tokenizer
73
+ model_id = "telecomadm1145/mamba2_exp3"
74
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
75
+ model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
76
+
77
+ # Move to GPU if available
78
+ device = "cuda" if torch.cuda.is_available() else "cpu"
79
+ model.to(device)
80
+
81
+ # Generate text
82
+ text = "<replace your prompt here>"
83
+ inputs = tokenizer(text, return_tensors="pt").to(device)
84
+
85
+ outputs = model.generate(
86
+ **inputs,
87
+ max_new_tokens=100,
88
+ do_sample=True,
89
+ top_k=50,
90
+ top_p=0.95,
91
+ repetition_penalty=1.1
92
+ )
93
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
94
+ ```
95
+
96
+ ## Training Details
97
+
98
+ ### Training Data
99
+
100
+ - **Dataset Name:** [esjzone_novel_cn](https://huggingface.co/datasets/telecomadm1145/esjzone_novel_cn)
101
+ - **Data Type:** Chinese Light Novels (轻小说).
102
+ - **Data Size:** Approximately 1GB.
103
+ - **Preprocessing:** The data was **uncleaned** (raw text) during training.
104
+
105
+ ### Training Procedure
106
+
107
+ #### Training Hyperparameters
108
+
109
+ - **Context Length:** 1024 tokens
110
+ - **Training Stage:** Pre-training (Causal Language Modeling)
111
+
112
+ #### Speeds, Sizes, Times
113
+
114
+ - **Hardware:** 2x NVIDIA T4 GPUs
115
+ - **Training Duration:** ~11.5 hours
116
+ - **Model Parameters:** ~0.4 Billion
117
+
118
+ ## Environmental Impact
119
+
120
+ - **Hardware Type:** NVIDIA T4 x2
121
+ - **Hours used:** 11.5 hours
122
+ - **Compute Region:** [Unknown/Cloud]
123
+
124
+ ## Technical Specifications
125
+
126
+ ### Model Architecture and Objective
127
+
128
+ The model follows the **Mamba2** architecture, which is a type of State Space Model (SSM) designed to handle sequences efficiently. The objective was standard Causal Language Modeling (predicting the next token) on a dataset of fiction.
129
+
130
+ ---