English β†’ Hindi Transformer (Ray Tune + Optuna Optimized)

Custom PyTorch Transformer optimized with Ray Tune + OptunaSearch + ASHA Scheduler. Best config found in 50 epochs beats the 100-epoch baseline BLEU.

Evaluation

Both models are evaluated on the same held-out 20% validation set (not training data). This ensures a fair comparison β€” the baseline cannot inflate its score through memorisation.

Results

Model Epochs Train Loss Val BLEU
Baseline 100 0.0857 0.1169
Best Tuned 50 0.2696 0.1748

Search Strategy

  • Algorithm: Optuna TPE Sampler via ray.tune.search.optuna.OptunaSearch
  • Scheduler: ASHA (kills underperforming trials early)
  • Metric optimised: Validation BLEU (not training loss)
  • Trials: 35 Γ— max 50 epochs

Best Hyperparameters

{
  "d_model": 256,
  "num_heads": 4,
  "num_enc_layers": 2,
  "num_dec_layers": 3,
  "d_ff": 2048,
  "dropout": 0.07894942617730782,
  "lr": 0.00018938218352515765,
  "batch_size": 32
}

Usage

import torch, pickle, json
from huggingface_hub import hf_hub_download

weights = hf_hub_download("DuckyDuck123/en-hi-transformer-tuned", "M25CSA007_ass_4_best_model.pth")
cfg     = json.load(open(hf_hub_download("DuckyDuck123/en-hi-transformer-tuned", "best_config.json")))
model   = Transformer(src_vocab, tgt_vocab, **{k: cfg[k] for k in
          ['d_model','num_heads','num_enc_layers','num_dec_layers','d_ff','dropout']})
model.load_state_dict(torch.load(weights, map_location='cpu'))
model.eval()
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support