Model Card for ethicalabs/Echo-SmolTools-114M-Intent-PEFT

The Echo-SmolTools-114M-Intent-PEFT is a a LoRA-based adapter trained over the Echo-DSRN-114M-v0.1.2 base RNN architecture, optimized as multilingual intentionality classifier.

Gradio App now available: ๐ŸŽ™๏ธ Echo Intent: Multilingual Intent Classifier

Model Usage

You can load the fine-tuned PEFT adapter over the base model and run intent classification inference as follows:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the base model and tokenizer
base_model_name = "ethicalabs/Echo-DSRN-114M-v0.1.2"
base_model = AutoModelForCausalLM.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)

# Load the fine-tuned PEFT adapter (example: Round 28)
peft_model_name = "ethicalabs/Echo-SmolTools-114M-Intent-PEFT"
model = PeftModel.from_pretrained(base_model, peft_model_name, trust_remote_code=True)

# Prepare the prompt for multilingual intent classification
utt = "Che ore sono a Roma?" # "What time is it in Rome?"
messages = [
    {"role": "system", "content": "You are a helpful multilingual intent classification assistant."},
    {"role": "user", "content": f"Classify the intent of the following request: {utt}"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Tokenize and generate
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=15, do_sample=False)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()

print(f"User: {utt}")
print(f"Intent: {response}")

Output:

User: Che ore sono a Roma?
Intent: datetime_query

Base Model: ethicalabs/Echo-DSRN-114M-v0.1.2

๐Ÿ—๏ธ Architecture Details

Property Value
Model Type echo_dsrn
Layers 8
Hidden Dim 512
Attention Heads 4
MLP Ratio 8.0
Vocab Size 32011
Hybrid Attention True
RMSNorm True

๐Ÿ“Š Parameter Breakdown

Component Parameters % of Total
Total 114.69M (114,687,488) 100%
Embeddings 16.39M 14.29%
DSRN Blocks (Aggregate) 81.91M 71.42%
LM Head 16.39M 14.29%

๐Ÿงฉ Internal Block Structure (Per Layer)

Sub-Component Parameters Description
MLP (Feed-Forward) 4.20M Upscaled hidden layers
DSRN Slow State 3.15M Constant-time memory gates
GRU Fast State 1.58M Recurrent fast path
Surprise Gating 264,192 Dynamic focus mechanism
Normalization 1,024 LayerNorm / RMSNorm

Benchmarks

๐Ÿ“Š Absolute Multilingual Census

The following metrics represent a statistically absolute baseline, evaluated across the full 36,594-sample validation set using a deterministic greedy-decoding policy.

Locale Accuracy Matches / Total
๐Ÿ‡บ๐Ÿ‡ธ English (en-US) 79.39% 4,842 / 6,099
๐Ÿ‡ฎ๐Ÿ‡น Italian (it-IT) 73.49% 4,482 / 6,099
๐Ÿ‡ช๐Ÿ‡ธ Spanish (es-ES) 72.50% 4,422 / 6,099
๐Ÿ‡ซ๐Ÿ‡ท French (fr-FR) 72.16% 4,401 / 6,099
๐Ÿ‡ต๐Ÿ‡น Portuguese (pt-PT) 71.77% 4,377 / 6,099
๐Ÿ‡ฉ๐Ÿ‡ช German (de-DE) 65.52% 3,996 / 6,099
--- --- ---
๐ŸŒ OVERALL 72.47% 26,520 / 36,594

Training procedure

This LoRA adapter has been fine-tuned (SFT) on a single AMD Radeonโ„ข AI PRO R9700 (32 GB RAM) by using the Flower Framework and TRL, in a simulated federated learning scenario.

Training Metrics

training_metrics

INFO :      aggregate_fit: received 2 results and 0 failures
INFO :      Communication budget: used 16716.43 MB (+278.61 MB this round) / 200,000 MB
Loading weights: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 139/139 [00:00<00:00, 3686.31it/s, Materializing param=model.final_norm.weight]            
INFO :      fit progress: (30, 0.0, {}, 926.4947838319931)
INFO :      configure_evaluate: no clients selected, skipping evaluation
INFO :      
INFO :      [SUMMARY]
INFO :      Run finished 30 round(s) in 926.49s
...
INFO :      	History (metrics, distributed, fit):
INFO :      	{'entropy': [(1, 3.1767146694660187),
INFO :      	             (2, 2.6490936136245726),
INFO :      	             (3, 2.582885365486145),
INFO :      	             (4, 2.403850073814392),
INFO :      	             (5, 2.404714601635933),
INFO :      	             (6, 2.397827633917332),
INFO :      	             (7, 2.3419336032867433),
INFO :      	             (8, 2.330485168099403),
INFO :      	             (9, 2.2885197573900222),
INFO :      	             (10, 2.3625612980127335),
INFO :      	             (11, 2.2621049478650095),
INFO :      	             (12, 2.2685215598344803),
INFO :      	             (13, 2.193116867244244),
INFO :      	             (14, 2.16418510556221),
INFO :      	             (15, 2.1816292345523833),
INFO :      	             (16, 2.2237485074996948),
INFO :      	             (17, 2.240292007625103),
INFO :      	             (18, 2.1769691184163094),
INFO :      	             (19, 2.22042086571455),
INFO :      	             (20, 2.2185776421427725),
INFO :      	             (21, 2.18163181245327),
INFO :      	             (22, 2.15154937684536),
INFO :      	             (23, 2.175434983074665),
INFO :      	             (24, 2.160964986979961),
INFO :      	             (25, 2.158632977604866),
INFO :      	             (26, 2.1607184839248657),
INFO :      	             (27, 2.1532266357541086),
INFO :      	             (28, 2.1472932541370393),
INFO :      	             (29, 2.155655029118061),
INFO :      	             (30, 2.149040196239948)],
INFO :      	 'mean_token_accuracy': [(1, 0.7308300926908851),
INFO :      	                         (2, 0.8405319826304912),
INFO :      	                         (3, 0.8734206096827983),
INFO :      	                         (4, 0.8932938988506794),
INFO :      	                         (5, 0.922560573220253),
INFO :      	                         (6, 0.9258149369060993),
INFO :      	                         (7, 0.945394709855318),
INFO :      	                         (8, 0.9518885087966918),
INFO :      	                         (9, 0.9530007430911064),
INFO :      	                         (10, 0.9602652615308762),
INFO :      	                         (11, 0.9809961877763271),
INFO :      	                         (12, 0.9679327207803726),
INFO :      	                         (13, 0.9806139521300793),
INFO :      	                         (14, 0.986053352355957),
INFO :      	                         (15, 0.9900700397789478),
INFO :      	                         (16, 0.9693947829306125),
INFO :      	                         (17, 0.9722562806308269),
INFO :      	                         (18, 0.9861463868618011),
INFO :      	                         (19, 0.995743811428547),
INFO :      	                         (20, 0.9937835520505905),
INFO :      	                         (21, 0.9716044381260872),
INFO :      	                         (22, 0.999490964114666),
INFO :      	                         (23, 0.991388481259346),
INFO :      	                         (24, 0.9873159317672253),
INFO :      	                         (25, 0.985576259046793),
INFO :      	                         (26, 0.9877067220211029),
INFO :      	                         (27, 0.9989752702414989),
INFO :      	                         (28, 0.9998132897913456),
INFO :      	                         (29, 0.9981746172904968),
INFO :      	                         (30, 0.9881662499904632)],
INFO :      	 'train_loss': [(1, 1.1626044915243983),
INFO :      	                (2, 0.5194999669492245),
INFO :      	                (3, 0.40047831716015936),
INFO :      	                (4, 0.3398165641538799),
INFO :      	                (5, 0.24711219662043732),
INFO :      	                (6, 0.2427184192603454),
INFO :      	                (7, 0.17238803381100298),
INFO :      	                (8, 0.15867113580141448),
INFO :      	                (9, 0.15771765831450466),
INFO :      	                (10, 0.12902066930488218),
INFO :      	                (11, 0.06335343442275189),
INFO :      	                (12, 0.11254073319127202),
INFO :      	                (13, 0.06188895832223352),
INFO :      	                (14, 0.05105806810490321),
INFO :      	                (15, 0.03345653552742078),
INFO :      	                (16, 0.10704563170089386),
INFO :      	                (17, 0.09614553463034098),
INFO :      	                (18, 0.04905555162069503),
INFO :      	                (19, 0.016399495133809977),
INFO :      	                (20, 0.022009438261029572),
INFO :      	                (21, 0.10738116894423001),
INFO :      	                (22, 0.0036633076894486295),
INFO :      	                (23, 0.03189122898129426),
INFO :      	                (24, 0.043155781974319324),
INFO :      	                (25, 0.05212347019591107),
INFO :      	                (26, 0.04447804041239579),
INFO :      	                (27, 0.004012700952171144),
INFO :      	                (28, 0.0014076900614327314),
INFO :      	                (29, 0.006376274699809983),
INFO :      	                (30, 0.04514436764531638)]}
INFO :      

Framework versions

  • TRL: 1.1.0
  • Transformers: 5.2.0
  • Pytorch: 2.10.0+rocm7.1
  • Datasets: 4.8.4
  • Tokenizers: 0.22.2
  • Flwr: 1.28.0
  • Flwr-datasets: 0.6.0

Citation

If you use this model in your research, please cite it as follows:

@misc{Massimo Roberto Scamarcia, title={Echo-DSRN-114M: Surprise-Gated Dual-State Recurrent Architecture for Efficient Language Modeling and Classification}, DOI={10.5281/zenodo.19848279}, publisher={Zenodo}, author={Massimo Roberto Scamarcia} }
Downloads last month
292
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ethicalabs/Echo-SmolTools-114M-Intent-PEFT

Dataset used to train ethicalabs/Echo-SmolTools-114M-Intent-PEFT

Spaces using ethicalabs/Echo-SmolTools-114M-Intent-PEFT 2

Collection including ethicalabs/Echo-SmolTools-114M-Intent-PEFT