Thai LLM with MoE-like Architecture
Model Description
This is a custom Transformer-based Large Language Model for the Thai language, trained from scratch with a focus on incorporating concepts that could support MoE-like characteristics, agent capabilities, enhanced reasoning, and thinking mode switching in future iterations. While the current model is a foundational Transformer, it is designed with a modular structure (see ThaiLLMBlock) to facilitate the integration of MoE layers or other specialized modules. The model is intended for text generation tasks in Thai.
Dataset
This model was fine-tuned on the ZombitX64/Wikipedia-Thai dataset. This dataset consists of text content extracted from Thai Wikipedia articles.
Training
The model was trained using the Hugging Face Trainer library.
- Base Architecture: Custom Transformer-based LLM (
ThaiLLMForCausalLM) - Dataset: ZombitX64/Wikipedia-Thai (a subset used for training and evaluation)
- Epochs: 3 (based on the configuration in
TrainingArguments) - Batch Size: 4 per device (training and evaluation)
- Optimizer: AdamW with a learning rate of 5e-5
- Loss Function: Cross-Entropy Loss (standard for causal language modeling)
Training was performed with attention mask handling adapted for torch.nn.MultiheadAttention's key_padding_mask format. Evaluation was conducted at the end of each epoch based on evaluation loss.
Usage
To use this model for text generation, you can load it using the transformers library.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Define the model name on Hugging Face Hub
model_name = "YOUR_HUGGINGFACE_USERNAME/thai-llm-moe-agent" # Replace with your username and repo name
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Load the model
# You might need to register the custom model architecture first if it's not a standard type
# from your_model_file import ThaiLLMConfig, ThaiLLMForCausalLM
# AutoModelForCausalLM.register(ThaiLLMConfig, ThaiLLMForCausalLM) # Register if needed
# Assuming the model was saved in a format compatible with AutoModelForCausalLM.from_pretrained
# If not, you might need to load it using the custom class:
# config = ThaiLLMConfig.from_pretrained(model_name)
# model = ThaiLLMForCausalLM.from_pretrained(model_name, config=config)
try:
model = AutoModelForCausalLM.from_pretrained(model_name)
print("Model loaded successfully with AutoModelForCausalLM.")
except Exception as e:
print(f"Could not load with AutoModelForCausalLM: {e}") # Use double braces to escape f-string
print("Attempting to load with custom class...")
# This part assumes your custom classes (ThaiLLMConfig, ThaiLLLMLMForCausalLM) are defined in the environment
# In a real scenario, you would need to ensure these classes are importable.
try:
# Assuming custom classes are available in this scope for this example
config = ThaiLLMConfig.from_pretrained(model_name)
model = ThaiLLMForCausalLM.from_pretrained(model_name, config=config)
print("Model loaded successfully with custom class.")
except Exception as load_e:
print(f"Failed to load model even with custom class: {load_e}") # Use double braces
model = None # Set model to None if loading fails
if model:
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval() # Set the model to evaluation mode
# Example Text Generation
prompt = "ประเทศไทยมีเมืองหลวงชื่อ" # "Thailand has a capital city named"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# Generate text
with torch.no_grad():
outputs = model.generate(
inputs["input_ids"],
max_length=50,
num_return_sequences=1,
no_repeat_ngram_size=2,
early_stopping=True
)
# Decode and print the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated text:")
print(generated_text)
(Note: Replace YOUR_HUGGINGFACE_USERNAME with your actual Hugging Face username)
Limitations
- This is a base model fine-tuned on a specific dataset and may not generalize well to all Thai text generation tasks.
- The MoE expertise, agent capabilities, advanced reasoning, and thinking mode switching features are aspirational for future development and are not fully implemented in this initial version. The current model is a standard Transformer.
- Performance metrics beyond evaluation loss (like perplexity) were calculated during training setup but comprehensive human evaluation or task-specific evaluations has not been performed.
- The model's knowledge is limited to the training data (Wikipedia).
Acknowledgements
- The ZombitX64/Wikipedia-Thai dataset creators for providing the training data.
- The Hugging Face
transformersanddatasetslibraries for providing the framework and tools used for development and training. - The
airesearch/wangchanberta-base-att-spm-uncasedtokenizer developers.
- Downloads last month
- 4