Thai LLM with MoE-like Architecture

Model Description

This is a custom Transformer-based Large Language Model for the Thai language, trained from scratch with a focus on incorporating concepts that could support MoE-like characteristics, agent capabilities, enhanced reasoning, and thinking mode switching in future iterations. While the current model is a foundational Transformer, it is designed with a modular structure (see ThaiLLMBlock) to facilitate the integration of MoE layers or other specialized modules. The model is intended for text generation tasks in Thai.

Dataset

This model was fine-tuned on the ZombitX64/Wikipedia-Thai dataset. This dataset consists of text content extracted from Thai Wikipedia articles.

Training

The model was trained using the Hugging Face Trainer library.

Base Architecture: Custom Transformer-based LLM (ThaiLLMForCausalLM)
Dataset: ZombitX64/Wikipedia-Thai (a subset used for training and evaluation)
Epochs: 3 (based on the configuration in TrainingArguments)
Batch Size: 4 per device (training and evaluation)
Optimizer: AdamW with a learning rate of 5e-5
Loss Function: Cross-Entropy Loss (standard for causal language modeling)

Training was performed with attention mask handling adapted for torch.nn.MultiheadAttention's key_padding_mask format. Evaluation was conducted at the end of each epoch based on evaluation loss.

Usage

To use this model for text generation, you can load it using the transformers library.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Define the model name on Hugging Face Hub
model_name = "YOUR_HUGGINGFACE_USERNAME/thai-llm-moe-agent" # Replace with your username and repo name

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load the model
# You might need to register the custom model architecture first if it's not a standard type
# from your_model_file import ThaiLLMConfig, ThaiLLMForCausalLM
# AutoModelForCausalLM.register(ThaiLLMConfig, ThaiLLMForCausalLM) # Register if needed

# Assuming the model was saved in a format compatible with AutoModelForCausalLM.from_pretrained
# If not, you might need to load it using the custom class:
# config = ThaiLLMConfig.from_pretrained(model_name)
# model = ThaiLLMForCausalLM.from_pretrained(model_name, config=config)

try:
    model = AutoModelForCausalLM.from_pretrained(model_name)
    print("Model loaded successfully with AutoModelForCausalLM.")
except Exception as e:
    print(f"Could not load with AutoModelForCausalLM: {e}") # Use double braces to escape f-string
    print("Attempting to load with custom class...")
    # This part assumes your custom classes (ThaiLLMConfig, ThaiLLLMLMForCausalLM) are defined in the environment
    # In a real scenario, you would need to ensure these classes are importable.
    try:
        # Assuming custom classes are available in this scope for this example
        config = ThaiLLMConfig.from_pretrained(model_name)
        model = ThaiLLMForCausalLM.from_pretrained(model_name, config=config)
        print("Model loaded successfully with custom class.")
    except Exception as load_e:
        print(f"Failed to load model even with custom class: {load_e}") # Use double braces
        model = None # Set model to None if loading fails


if model:
    # Move model to GPU if available
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    model.eval() # Set the model to evaluation mode

    # Example Text Generation
    prompt = "ประเทศไทยมีเมืองหลวงชื่อ" # "Thailand has a capital city named"
    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    # Generate text
    with torch.no_grad():
        outputs = model.generate(
            inputs["input_ids"],
            max_length=50,
            num_return_sequences=1,
            no_repeat_ngram_size=2,
            early_stopping=True
        )

    # Decode and print the generated text
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print("Generated text:")
    print(generated_text)

(Note: Replace YOUR_HUGGINGFACE_USERNAME with your actual Hugging Face username)

Limitations

This is a base model fine-tuned on a specific dataset and may not generalize well to all Thai text generation tasks.
The MoE expertise, agent capabilities, advanced reasoning, and thinking mode switching features are aspirational for future development and are not fully implemented in this initial version. The current model is a standard Transformer.
Performance metrics beyond evaluation loss (like perplexity) were calculated during training setup but comprehensive human evaluation or task-specific evaluations has not been performed.
The model's knowledge is limited to the training data (Wikipedia).

Acknowledgements

The ZombitX64/Wikipedia-Thai dataset creators for providing the training data.
The Hugging Face transformers and datasets libraries for providing the framework and tools used for development and training.
The airesearch/wangchanberta-base-att-spm-uncased tokenizer developers.

Downloads last month: -

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support