---
language:
- zh
library_name: transformers
pipeline_tag: text-generation
license: mit
datasets:
- telecomadm1145/esjzone_novel_cn
tags:
- mamba2
---

# mamba2_exp2

<!-- Provide a quick summary of what the model is/does. -->

**mamba2_exp2** is a **Mamba2** architecture model with approximately **0.2 Billion parameters**. It has been pre-trained on a dataset of Chinese light novels (esjzone). It is intended for text generation and story continuation tasks in Chinese.

## Model Details

### Model Description

This model utilizes the Mamba2 state-space model architecture, designed for efficient inference. It was pre-trained from scratch on a corpus of uncleaned Chinese light novels.

**Note:** This is a **base model** (pre-trained only), meaning it has **not** undergone instruction tuning (RLHF or SFT). It is best suited for completing text based on a prompt (continuation) rather than answering questions or following complex instructions.

- **Developed by:** telecomadm1145
- **Model type:** Mamba2 (State Space Model)
- **Language(s) (NLP):** Chinese (zh)
- **License:** MIT
- **Finetuned from model:** None (Trained from scratch)
- **Model Size:** ~0.2B parameters
- **Context Length:** 1024 tokens

### Model Sources

- **Repository:** [https://huggingface.co/telecomadm1145/mamba2_exp2](https://huggingface.co/telecomadm1145/mamba2_exp2)
- **Dataset:** [telecomadm1145/esjzone_novel_cn](https://huggingface.co/datasets/telecomadm1145/esjzone_novel_cn)

## Uses

### Direct Use

The model is designed for:
- **Creative Writing:** Generating light novel-style stories.
- **Text Completion:** Continuing a given text narrative in Chinese.
- **Style Imitation:** Mimicking the tropes and writing styles found in web novels.

### Out-of-Scope Use

- **Factual Question Answering:** Since it is trained on fiction, it will likely hallucinate facts.
- **Instruction Following:** It has not been fine-tuned to follow commands (e.g., "Write a summary of...").
- **Code Generation:** Not trained on code.
- **Long-context retrieval:** The model was trained with a context window of 1024 tokens; performance may degrade significantly beyond this length.

## Bias, Risks, and Limitations

- **Dataset Quality:** The training data consists of **uncleaned** web novels. Consequently, the model may generate text containing typos, grammatical errors, or non-standard formatting present in the source material.
- **Content Warnings:** The model may generate content that includes violence, mature themes, or offensive language, reflecting the nature of some web fiction genres.
- **Hallucinations:** As a fiction-focused model, it creates content and should not be used as a knowledge base.

## How to Get Started with the Model

Use the code below to get started with the model.

**Note:** You may need to install `mamba-ssm` and `causal-conv1d` depending on the environment configuration for Mamba2 models.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_id = "telecomadm1145/mamba2_exp2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

# Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Generate text
text = "<replace your prompt here>"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(
    **inputs, 
    max_new_tokens=100, 
    do_sample=True, 
    top_k=50, 
    top_p=0.95,
    repetition_penalty=1.1
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Training Details

### Training Data

- **Dataset Name:** [esjzone_novel_cn](https://huggingface.co/datasets/telecomadm1145/esjzone_novel_cn)
- **Data Type:** Chinese Light Novels (轻小说).
- **Data Size:** Approximately 1GB.
- **Preprocessing:** The data was **uncleaned** (raw text) during training.

### Training Procedure

#### Training Hyperparameters

- **Context Length:** 1024 tokens
- **Training Stage:** Pre-training (Causal Language Modeling)

#### Speeds, Sizes, Times

- **Hardware:** 2x NVIDIA T4 GPUs
- **Training Duration:** ~23 hours
- **Model Parameters:** ~0.2 Billion

## Environmental Impact

- **Hardware Type:** NVIDIA T4 x2
- **Hours used:** 23 hours
- **Compute Region:** [Unknown/Cloud]

## Technical Specifications

### Model Architecture and Objective

The model follows the **Mamba2** architecture, which is a type of State Space Model (SSM) designed to handle sequences efficiently. The objective was standard Causal Language Modeling (predicting the next token) on a dataset of fiction.

---