English β Hindi Transformer (Ray Tune + Optuna Optimized)
Custom PyTorch Transformer optimized with Ray Tune + OptunaSearch + ASHA Scheduler. Best config found in 50 epochs beats the 100-epoch baseline BLEU.
Evaluation
Both models are evaluated on the same held-out 20% validation set (not training data). This ensures a fair comparison β the baseline cannot inflate its score through memorisation.
Results
| Model | Epochs | Train Loss | Val BLEU |
|---|---|---|---|
| Baseline | 100 | 0.0857 | 0.1169 |
| Best Tuned | 50 | 0.2696 | 0.1748 |
Search Strategy
- Algorithm: Optuna TPE Sampler via
ray.tune.search.optuna.OptunaSearch - Scheduler: ASHA (kills underperforming trials early)
- Metric optimised: Validation BLEU (not training loss)
- Trials: 35 Γ max 50 epochs
Best Hyperparameters
{
"d_model": 256,
"num_heads": 4,
"num_enc_layers": 2,
"num_dec_layers": 3,
"d_ff": 2048,
"dropout": 0.07894942617730782,
"lr": 0.00018938218352515765,
"batch_size": 32
}
Usage
import torch, pickle, json
from huggingface_hub import hf_hub_download
weights = hf_hub_download("DuckyDuck123/en-hi-transformer-tuned", "M25CSA007_ass_4_best_model.pth")
cfg = json.load(open(hf_hub_download("DuckyDuck123/en-hi-transformer-tuned", "best_config.json")))
model = Transformer(src_vocab, tgt_vocab, **{k: cfg[k] for k in
['d_model','num_heads','num_enc_layers','num_dec_layers','d_ff','dropout']})
model.load_state_dict(torch.load(weights, map_location='cpu'))
model.eval()