YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
Hierarchical Attention Transformer
Hierarchical Attention Transformer with disentangled lexical, syntactic, and semantic subspaces for paraphrase detection.
Architecture
The model separates representation learning into three subspaces
lexical
syntactic
semantic
Encoders used
microsoft/deberta-v3-base
roberta-base
The model uses
additive attention pooling
cross attention semantic interaction
gated bilinear fusion
meta classifier
Repository Structure
model main_model baselines ablations
evaluation plots tables analysis
Training and Evaluation
Evaluation includes
ablation experiments
calibration analysis
robustness testing
bootstrap confidence intervals
paired bootstrap significance tests
CKA representation similarity
structural probing
Citation
If you use this work
@misc{hierarchical_attention_transformer_2026, title={Hierarchical Attention Transformer}, author={Govind}, year={2026} }