--- language: - zh library_name: transformers pipeline_tag: text-generation license: mit datasets: - telecomadm1145/esjzone_novel_cn tags: - mamba2 --- # mamba2_exp2 **mamba2_exp2** is a **Mamba2** architecture model with approximately **0.2 Billion parameters**. It has been pre-trained on a dataset of Chinese light novels (esjzone). It is intended for text generation and story continuation tasks in Chinese. ## Model Details ### Model Description This model utilizes the Mamba2 state-space model architecture, designed for efficient inference. It was pre-trained from scratch on a corpus of uncleaned Chinese light novels. **Note:** This is a **base model** (pre-trained only), meaning it has **not** undergone instruction tuning (RLHF or SFT). It is best suited for completing text based on a prompt (continuation) rather than answering questions or following complex instructions. - **Developed by:** telecomadm1145 - **Model type:** Mamba2 (State Space Model) - **Language(s) (NLP):** Chinese (zh) - **License:** MIT - **Finetuned from model:** None (Trained from scratch) - **Model Size:** ~0.2B parameters - **Context Length:** 1024 tokens ### Model Sources - **Repository:** [https://huggingface.co/telecomadm1145/mamba2_exp2](https://huggingface.co/telecomadm1145/mamba2_exp2) - **Dataset:** [telecomadm1145/esjzone_novel_cn](https://huggingface.co/datasets/telecomadm1145/esjzone_novel_cn) ## Uses ### Direct Use The model is designed for: - **Creative Writing:** Generating light novel-style stories. - **Text Completion:** Continuing a given text narrative in Chinese. - **Style Imitation:** Mimicking the tropes and writing styles found in web novels. ### Out-of-Scope Use - **Factual Question Answering:** Since it is trained on fiction, it will likely hallucinate facts. - **Instruction Following:** It has not been fine-tuned to follow commands (e.g., "Write a summary of..."). - **Code Generation:** Not trained on code. - **Long-context retrieval:** The model was trained with a context window of 1024 tokens; performance may degrade significantly beyond this length. ## Bias, Risks, and Limitations - **Dataset Quality:** The training data consists of **uncleaned** web novels. Consequently, the model may generate text containing typos, grammatical errors, or non-standard formatting present in the source material. - **Content Warnings:** The model may generate content that includes violence, mature themes, or offensive language, reflecting the nature of some web fiction genres. - **Hallucinations:** As a fiction-focused model, it creates content and should not be used as a knowledge base. ## How to Get Started with the Model Use the code below to get started with the model. **Note:** You may need to install `mamba-ssm` and `causal-conv1d` depending on the environment configuration for Mamba2 models. ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model and tokenizer model_id = "telecomadm1145/mamba2_exp2" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True) # Move to GPU if available device = "cuda" if torch.cuda.is_available() else "cpu" model.to(device) # Generate text text = "" inputs = tokenizer(text, return_tensors="pt").to(device) outputs = model.generate( **inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95, repetition_penalty=1.1 ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Training Details ### Training Data - **Dataset Name:** [esjzone_novel_cn](https://huggingface.co/datasets/telecomadm1145/esjzone_novel_cn) - **Data Type:** Chinese Light Novels (轻小说). - **Data Size:** Approximately 1GB. - **Preprocessing:** The data was **uncleaned** (raw text) during training. ### Training Procedure #### Training Hyperparameters - **Context Length:** 1024 tokens - **Training Stage:** Pre-training (Causal Language Modeling) #### Speeds, Sizes, Times - **Hardware:** 2x NVIDIA T4 GPUs - **Training Duration:** ~23 hours - **Model Parameters:** ~0.2 Billion ## Environmental Impact - **Hardware Type:** NVIDIA T4 x2 - **Hours used:** 23 hours - **Compute Region:** [Unknown/Cloud] ## Technical Specifications ### Model Architecture and Objective The model follows the **Mamba2** architecture, which is a type of State Space Model (SSM) designed to handle sequences efficiently. The objective was standard Causal Language Modeling (predicting the next token) on a dataset of fiction. ---