Khmer Word Prediction Model

Caption

A Khmer language model for next-word prediction and text generation, designed to support applications such as autocomplete, intelligent typing assistants, and Khmer NLP research.


Model Description

This model is trained to predict the next word in a Khmer sentence using deep learning techniques.

  • Model file: khmer_lm_best.pt
  • Framework: PyTorch
  • Architecture: (update this: LSTM / GRU / Transformer)
  • Task: Next-word prediction (Language Modeling)

Dataset

  • Source: Synthetic + real Khmer text
  • Training samples: ~350,000 sentences
  • Validation samples: ~47,000 sentences

How to Use (Test Code)

1. Install dependencies

pip install torch
import torch

# Load model
model = torch.load("khmer_lm_best.pt", map_location="cpu")
model.eval()
def predict_next_word(model, input_text):
    # TODO: replace with your tokenizer logic
    tokens = input_text.split()

    # Dummy example (you must adapt to your model)
    input_ids = [0] * len(tokens)  # replace with real encoding

    input_tensor = torch.tensor([input_ids])

    with torch.no_grad():
        output = model(input_tensor)

    predicted_id = output.argmax(dim=-1)[0, -1].item()

    # TODO: decode predicted_id to word
    return predicted_id


# Example
text = "αžαŸ’αž‰αž»αŸ†αž…αž„αŸ‹αž‘αŸ…"
print("Input:", text)
print("Predicted next word:", predict_next_word(model, text))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train phonsobon/khmer-word-prediction