Hierarchical Attention Transformer

Hierarchical Attention Transformer with disentangled lexical, syntactic, and semantic subspaces for paraphrase detection.

Architecture

The model separates representation learning into three subspaces

lexical
syntactic
semantic

Encoders used

microsoft/deberta-v3-base
roberta-base

The model uses

additive attention pooling
cross attention semantic interaction
gated bilinear fusion
meta classifier

Repository Structure

model main_model baselines ablations

evaluation plots tables analysis

Training and Evaluation

Evaluation includes

ablation experiments
calibration analysis
robustness testing
bootstrap confidence intervals
paired bootstrap significance tests
CKA representation similarity
structural probing

Citation

If you use this work

@misc{hierarchical_attention_transformer_2026, title={Hierarchical Attention Transformer}, author={Govind}, year={2026} }

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support