English → Hindi Transformer (Ray Tune + Optuna Optimized)

Custom PyTorch Transformer optimized with Ray Tune + OptunaSearch + ASHA Scheduler. Best config found in 50 epochs beats the 100-epoch baseline BLEU.

Evaluation

Both models are evaluated on the same held-out 20% validation set (not training data). This ensures a fair comparison — the baseline cannot inflate its score through memorisation.

Results

Model	Epochs	Train Loss	Val BLEU
Baseline	100	0.0857	0.1169
Best Tuned	50	0.2696	0.1748

Search Strategy

Algorithm: Optuna TPE Sampler via ray.tune.search.optuna.OptunaSearch
Scheduler: ASHA (kills underperforming trials early)
Metric optimised: Validation BLEU (not training loss)
Trials: 35 × max 50 epochs

Best Hyperparameters

{
  "d_model": 256,
  "num_heads": 4,
  "num_enc_layers": 2,
  "num_dec_layers": 3,
  "d_ff": 2048,
  "dropout": 0.07894942617730782,
  "lr": 0.00018938218352515765,
  "batch_size": 32
}

Usage

import torch, pickle, json
from huggingface_hub import hf_hub_download

weights = hf_hub_download("DuckyDuck123/en-hi-transformer-tuned", "M25CSA007_ass_4_best_model.pth")
cfg     = json.load(open(hf_hub_download("DuckyDuck123/en-hi-transformer-tuned", "best_config.json")))
model   = Transformer(src_vocab, tgt_vocab, **{k: cfg[k] for k in
          ['d_model','num_heads','num_enc_layers','num_dec_layers','d_ff','dropout']})
model.load_state_dict(torch.load(weights, map_location='cpu'))
model.eval()

Downloads last month: -; Downloads are not tracked for this model. How to track