telecomadm1145
/

mamba2_exp3

+---
+language:
+- zh
+library_name: transformers
+pipeline_tag: text-generation
+license: mit
+datasets:
+- telecomadm1145/esjzone_novel_cn
+tags:
+- mamba2
+---
+# mamba2_exp3
+<!-- Provide a quick summary of what the model is/does. -->
+**mamba2_exp3** is a **Mamba2** architecture model with approximately **0.4 Billion parameters**. It has been pre-trained on a dataset of Chinese light novels (esjzone). It is intended for text generation and story continuation tasks in Chinese.
+## Model Details
+### Model Description
+This model utilizes the Mamba2 state-space model architecture, designed for efficient inference. It was pre-trained from scratch on a corpus of uncleaned Chinese light novels.
+**Note:** This is a **base model** (pre-trained only), meaning it has **not** undergone instruction tuning (RLHF or SFT). It is best suited for completing text based on a prompt (continuation) rather than answering questions or following complex instructions.
+- **Developed by:** telecomadm1145
+- **Model type:** Mamba2 (State Space Model)
+- **Language(s) (NLP):** Chinese (zh)
+- **License:** MIT
+- **Finetuned from model:** None (Trained from scratch)
+- **Model Size:** ~0.4B parameters
+- **Context Length:** 1024 tokens
+### Model Sources
+- **Repository:** [https://huggingface.co/telecomadm1145/mamba2_exp2](https://huggingface.co/telecomadm1145/mamba2_exp2)
+- **Dataset:** [telecomadm1145/esjzone_novel_cn](https://huggingface.co/datasets/telecomadm1145/esjzone_novel_cn)
+## Uses
+### Direct Use
+The model is designed for:
+- **Creative Writing:** Generating light novel-style stories.
+- **Text Completion:** Continuing a given text narrative in Chinese.
+- **Style Imitation:** Mimicking the tropes and writing styles found in web novels.
+### Out-of-Scope Use
+- **Factual Question Answering:** Since it is trained on fiction, it will likely hallucinate facts.
+- **Instruction Following:** It has not been fine-tuned to follow commands (e.g., "Write a summary of...").
+- **Code Generation:** Not trained on code.
+- **Long-context retrieval:** The model was trained with a context window of 1024 tokens; performance may degrade significantly beyond this length.
+## Bias, Risks, and Limitations
+- **Dataset Quality:** The training data consists of **uncleaned** web novels. Consequently, the model may generate text containing typos, grammatical errors, or non-standard formatting present in the source material.
+- **Content Warnings:** The model may generate content that includes violence, mature themes, or offensive language, reflecting the nature of some web fiction genres.
+- **Hallucinations:** As a fiction-focused model, it creates content and should not be used as a knowledge base.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+**Note:** You may need to install `mamba-ssm` and `causal-conv1d` depending on the environment configuration for Mamba2 models.
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+# Load model and tokenizer
+model_id = "telecomadm1145/mamba2_exp3"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
+# Move to GPU if available
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model.to(device)
+# Generate text
+text = "<replace your prompt here>"
+inputs = tokenizer(text, return_tensors="pt").to(device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=100,
+    do_sample=True,
+    top_k=50,
+    top_p=0.95,
+    repetition_penalty=1.1
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Training Details
+### Training Data
+- **Dataset Name:** [esjzone_novel_cn](https://huggingface.co/datasets/telecomadm1145/esjzone_novel_cn)
+- **Data Type:** Chinese Light Novels (轻小说).
+- **Data Size:** Approximately 1GB.
+- **Preprocessing:** The data was **uncleaned** (raw text) during training.
+### Training Procedure
+#### Training Hyperparameters
+- **Context Length:** 1024 tokens
+- **Training Stage:** Pre-training (Causal Language Modeling)
+#### Speeds, Sizes, Times
+- **Hardware:** 2x NVIDIA T4 GPUs
+- **Training Duration:** ~11.5 hours
+- **Model Parameters:** ~0.4 Billion
+## Environmental Impact
+- **Hardware Type:** NVIDIA T4 x2
+- **Hours used:** 11.5 hours
+- **Compute Region:** [Unknown/Cloud]
+## Technical Specifications
+### Model Architecture and Objective
+The model follows the **Mamba2** architecture, which is a type of State Space Model (SSM) designed to handle sequences efficiently. The objective was standard Causal Language Modeling (predicting the next token) on a dataset of fiction.
+---