Khmer Word Prediction Model

Caption

A Khmer language model for next-word prediction and text generation, designed to support applications such as autocomplete, intelligent typing assistants, and Khmer NLP research.

Model Description

This model is trained to predict the next word in a Khmer sentence using deep learning techniques.

Model file: khmer_lm_best.pt
Framework: PyTorch
Architecture: (update this: LSTM / GRU / Transformer)
Task: Next-word prediction (Language Modeling)

Dataset

Source: Synthetic + real Khmer text
Training samples: ~350,000 sentences
Validation samples: ~47,000 sentences

How to Use (Test Code)

1. Install dependencies

pip install torch
import torch

# Load model
model = torch.load("khmer_lm_best.pt", map_location="cpu")
model.eval()
def predict_next_word(model, input_text):
    # TODO: replace with your tokenizer logic
    tokens = input_text.split()

    # Dummy example (you must adapt to your model)
    input_ids = [0] * len(tokens)  # replace with real encoding

    input_tensor = torch.tensor([input_ids])

    with torch.no_grad():
        output = model(input_tensor)

    predicted_id = output.argmax(dim=-1)[0, -1].item()

    # TODO: decode predicted_id to word
    return predicted_id


# Example
text = "ខ្ញុំចង់ទៅ"
print("Input:", text)
print("Predicted next word:", predict_next_word(model, text))

Downloads last month: -; Downloads are not tracked for this model. How to track

phonsobon
/

khmer-word-prediction