mamba2_exp2
mamba2_exp2 is a Mamba2 architecture model with approximately 0.2 Billion parameters. It has been pre-trained on a dataset of Chinese light novels (esjzone). It is intended for text generation and story continuation tasks in Chinese.
Model Details
Model Description
This model utilizes the Mamba2 state-space model architecture, designed for efficient inference. It was pre-trained from scratch on a corpus of uncleaned Chinese light novels.
Note: This is a base model (pre-trained only), meaning it has not undergone instruction tuning (RLHF or SFT). It is best suited for completing text based on a prompt (continuation) rather than answering questions or following complex instructions.
- Developed by: telecomadm1145
- Model type: Mamba2 (State Space Model)
- Language(s) (NLP): Chinese (zh)
- License: MIT
- Finetuned from model: None (Trained from scratch)
- Model Size: ~0.2B parameters
- Context Length: 1024 tokens
Model Sources
- Repository: https://huggingface.co/telecomadm1145/mamba2_exp2
- Dataset: telecomadm1145/esjzone_novel_cn
Uses
Direct Use
The model is designed for:
- Creative Writing: Generating light novel-style stories.
- Text Completion: Continuing a given text narrative in Chinese.
- Style Imitation: Mimicking the tropes and writing styles found in web novels.
Out-of-Scope Use
- Factual Question Answering: Since it is trained on fiction, it will likely hallucinate facts.
- Instruction Following: It has not been fine-tuned to follow commands (e.g., "Write a summary of...").
- Code Generation: Not trained on code.
- Long-context retrieval: The model was trained with a context window of 1024 tokens; performance may degrade significantly beyond this length.
Bias, Risks, and Limitations
- Dataset Quality: The training data consists of uncleaned web novels. Consequently, the model may generate text containing typos, grammatical errors, or non-standard formatting present in the source material.
- Content Warnings: The model may generate content that includes violence, mature themes, or offensive language, reflecting the nature of some web fiction genres.
- Hallucinations: As a fiction-focused model, it creates content and should not be used as a knowledge base.
How to Get Started with the Model
Use the code below to get started with the model.
Note: You may need to install mamba-ssm and causal-conv1d depending on the environment configuration for Mamba2 models.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_id = "telecomadm1145/mamba2_exp2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
# Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# Generate text
text = "<replace your prompt here>"
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(
**inputs,
max_new_tokens=100,
do_sample=True,
top_k=50,
top_p=0.95,
repetition_penalty=1.1
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Training Data
- Dataset Name: esjzone_novel_cn
- Data Type: Chinese Light Novels (轻小说).
- Data Size: Approximately 1GB.
- Preprocessing: The data was uncleaned (raw text) during training.
Training Procedure
Training Hyperparameters
- Context Length: 1024 tokens
- Training Stage: Pre-training (Causal Language Modeling)
Speeds, Sizes, Times
- Hardware: 2x NVIDIA T4 GPUs
- Training Duration: ~23 hours
- Model Parameters: ~0.2 Billion
Environmental Impact
- Hardware Type: NVIDIA T4 x2
- Hours used: 23 hours
- Compute Region: [Unknown/Cloud]
Technical Specifications
Model Architecture and Objective
The model follows the Mamba2 architecture, which is a type of State Space Model (SSM) designed to handle sequences efficiently. The objective was standard Causal Language Modeling (predicting the next token) on a dataset of fiction.
- Downloads last month
- 84