mamba2_exp2 / README.md
telecomadm1145's picture
Update README.md
8ca89dd verified
---
language:
- zh
library_name: transformers
pipeline_tag: text-generation
license: mit
datasets:
- telecomadm1145/esjzone_novel_cn
tags:
- mamba2
---
# mamba2_exp2
<!-- Provide a quick summary of what the model is/does. -->
**mamba2_exp2** is a **Mamba2** architecture model with approximately **0.2 Billion parameters**. It has been pre-trained on a dataset of Chinese light novels (esjzone). It is intended for text generation and story continuation tasks in Chinese.
## Model Details
### Model Description
This model utilizes the Mamba2 state-space model architecture, designed for efficient inference. It was pre-trained from scratch on a corpus of uncleaned Chinese light novels.
**Note:** This is a **base model** (pre-trained only), meaning it has **not** undergone instruction tuning (RLHF or SFT). It is best suited for completing text based on a prompt (continuation) rather than answering questions or following complex instructions.
- **Developed by:** telecomadm1145
- **Model type:** Mamba2 (State Space Model)
- **Language(s) (NLP):** Chinese (zh)
- **License:** MIT
- **Finetuned from model:** None (Trained from scratch)
- **Model Size:** ~0.2B parameters
- **Context Length:** 1024 tokens
### Model Sources
- **Repository:** [https://huggingface.co/telecomadm1145/mamba2_exp2](https://huggingface.co/telecomadm1145/mamba2_exp2)
- **Dataset:** [telecomadm1145/esjzone_novel_cn](https://huggingface.co/datasets/telecomadm1145/esjzone_novel_cn)
## Uses
### Direct Use
The model is designed for:
- **Creative Writing:** Generating light novel-style stories.
- **Text Completion:** Continuing a given text narrative in Chinese.
- **Style Imitation:** Mimicking the tropes and writing styles found in web novels.
### Out-of-Scope Use
- **Factual Question Answering:** Since it is trained on fiction, it will likely hallucinate facts.
- **Instruction Following:** It has not been fine-tuned to follow commands (e.g., "Write a summary of...").
- **Code Generation:** Not trained on code.
- **Long-context retrieval:** The model was trained with a context window of 1024 tokens; performance may degrade significantly beyond this length.
## Bias, Risks, and Limitations
- **Dataset Quality:** The training data consists of **uncleaned** web novels. Consequently, the model may generate text containing typos, grammatical errors, or non-standard formatting present in the source material.
- **Content Warnings:** The model may generate content that includes violence, mature themes, or offensive language, reflecting the nature of some web fiction genres.
- **Hallucinations:** As a fiction-focused model, it creates content and should not be used as a knowledge base.
## How to Get Started with the Model
Use the code below to get started with the model.
**Note:** You may need to install `mamba-ssm` and `causal-conv1d` depending on the environment configuration for Mamba2 models.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_id = "telecomadm1145/mamba2_exp2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
# Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# Generate text
text = "<replace your prompt here>"
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(
**inputs,
max_new_tokens=100,
do_sample=True,
top_k=50,
top_p=0.95,
repetition_penalty=1.1
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Details
### Training Data
- **Dataset Name:** [esjzone_novel_cn](https://huggingface.co/datasets/telecomadm1145/esjzone_novel_cn)
- **Data Type:** Chinese Light Novels (轻小说).
- **Data Size:** Approximately 1GB.
- **Preprocessing:** The data was **uncleaned** (raw text) during training.
### Training Procedure
#### Training Hyperparameters
- **Context Length:** 1024 tokens
- **Training Stage:** Pre-training (Causal Language Modeling)
#### Speeds, Sizes, Times
- **Hardware:** 2x NVIDIA T4 GPUs
- **Training Duration:** ~23 hours
- **Model Parameters:** ~0.2 Billion
## Environmental Impact
- **Hardware Type:** NVIDIA T4 x2
- **Hours used:** 23 hours
- **Compute Region:** [Unknown/Cloud]
## Technical Specifications
### Model Architecture and Objective
The model follows the **Mamba2** architecture, which is a type of State Space Model (SSM) designed to handle sequences efficiently. The objective was standard Causal Language Modeling (predicting the next token) on a dataset of fiction.
---