|
|
--- |
|
|
language: |
|
|
- zh |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
license: mit |
|
|
datasets: |
|
|
- telecomadm1145/esjzone_novel_cn |
|
|
tags: |
|
|
- mamba2 |
|
|
--- |
|
|
|
|
|
# mamba2_exp2 |
|
|
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
**mamba2_exp2** is a **Mamba2** architecture model with approximately **0.2 Billion parameters**. It has been pre-trained on a dataset of Chinese light novels (esjzone). It is intended for text generation and story continuation tasks in Chinese. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
This model utilizes the Mamba2 state-space model architecture, designed for efficient inference. It was pre-trained from scratch on a corpus of uncleaned Chinese light novels. |
|
|
|
|
|
**Note:** This is a **base model** (pre-trained only), meaning it has **not** undergone instruction tuning (RLHF or SFT). It is best suited for completing text based on a prompt (continuation) rather than answering questions or following complex instructions. |
|
|
|
|
|
- **Developed by:** telecomadm1145 |
|
|
- **Model type:** Mamba2 (State Space Model) |
|
|
- **Language(s) (NLP):** Chinese (zh) |
|
|
- **License:** MIT |
|
|
- **Finetuned from model:** None (Trained from scratch) |
|
|
- **Model Size:** ~0.2B parameters |
|
|
- **Context Length:** 1024 tokens |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Repository:** [https://huggingface.co/telecomadm1145/mamba2_exp2](https://huggingface.co/telecomadm1145/mamba2_exp2) |
|
|
- **Dataset:** [telecomadm1145/esjzone_novel_cn](https://huggingface.co/datasets/telecomadm1145/esjzone_novel_cn) |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
The model is designed for: |
|
|
- **Creative Writing:** Generating light novel-style stories. |
|
|
- **Text Completion:** Continuing a given text narrative in Chinese. |
|
|
- **Style Imitation:** Mimicking the tropes and writing styles found in web novels. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
- **Factual Question Answering:** Since it is trained on fiction, it will likely hallucinate facts. |
|
|
- **Instruction Following:** It has not been fine-tuned to follow commands (e.g., "Write a summary of..."). |
|
|
- **Code Generation:** Not trained on code. |
|
|
- **Long-context retrieval:** The model was trained with a context window of 1024 tokens; performance may degrade significantly beyond this length. |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
- **Dataset Quality:** The training data consists of **uncleaned** web novels. Consequently, the model may generate text containing typos, grammatical errors, or non-standard formatting present in the source material. |
|
|
- **Content Warnings:** The model may generate content that includes violence, mature themes, or offensive language, reflecting the nature of some web fiction genres. |
|
|
- **Hallucinations:** As a fiction-focused model, it creates content and should not be used as a knowledge base. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
Use the code below to get started with the model. |
|
|
|
|
|
**Note:** You may need to install `mamba-ssm` and `causal-conv1d` depending on the environment configuration for Mamba2 models. |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_id = "telecomadm1145/mamba2_exp2" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True) |
|
|
|
|
|
# Move to GPU if available |
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
model.to(device) |
|
|
|
|
|
# Generate text |
|
|
text = "<replace your prompt here>" |
|
|
inputs = tokenizer(text, return_tensors="pt").to(device) |
|
|
|
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=100, |
|
|
do_sample=True, |
|
|
top_k=50, |
|
|
top_p=0.95, |
|
|
repetition_penalty=1.1 |
|
|
) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
- **Dataset Name:** [esjzone_novel_cn](https://huggingface.co/datasets/telecomadm1145/esjzone_novel_cn) |
|
|
- **Data Type:** Chinese Light Novels (轻小说). |
|
|
- **Data Size:** Approximately 1GB. |
|
|
- **Preprocessing:** The data was **uncleaned** (raw text) during training. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
- **Context Length:** 1024 tokens |
|
|
- **Training Stage:** Pre-training (Causal Language Modeling) |
|
|
|
|
|
#### Speeds, Sizes, Times |
|
|
|
|
|
- **Hardware:** 2x NVIDIA T4 GPUs |
|
|
- **Training Duration:** ~23 hours |
|
|
- **Model Parameters:** ~0.2 Billion |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
- **Hardware Type:** NVIDIA T4 x2 |
|
|
- **Hours used:** 23 hours |
|
|
- **Compute Region:** [Unknown/Cloud] |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
### Model Architecture and Objective |
|
|
|
|
|
The model follows the **Mamba2** architecture, which is a type of State Space Model (SSM) designed to handle sequences efficiently. The objective was standard Causal Language Modeling (predicting the next token) on a dataset of fiction. |
|
|
|
|
|
--- |