Model Card for ethicalabs/Echo-SmolTools-114M-Intent-PEFT

The Echo-SmolTools-114M-Intent-PEFT is a a LoRA-based adapter trained over the Echo-DSRN-114M-v0.1.2 base RNN architecture, optimized as multilingual intentionality classifier.

Gradio App now available: 🎙️ Echo Intent: Multilingual Intent Classifier

Model Usage

You can load the fine-tuned PEFT adapter over the base model and run intent classification inference as follows:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the base model and tokenizer
base_model_name = "ethicalabs/Echo-DSRN-114M-v0.1.2"
base_model = AutoModelForCausalLM.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)

# Load the fine-tuned PEFT adapter (example: Round 28)
peft_model_name = "ethicalabs/Echo-SmolTools-114M-Intent-PEFT"
model = PeftModel.from_pretrained(base_model, peft_model_name, trust_remote_code=True)

# Prepare the prompt for multilingual intent classification
utt = "Che ore sono a Roma?" # "What time is it in Rome?"
messages = [
    {"role": "system", "content": "You are a helpful multilingual intent classification assistant."},
    {"role": "user", "content": f"Classify the intent of the following request: {utt}"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Tokenize and generate
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=15, do_sample=False)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()

print(f"User: {utt}")
print(f"Intent: {response}")

Output:

User: Che ore sono a Roma?
Intent: datetime_query

Base Model: ethicalabs/Echo-DSRN-114M-v0.1.2

🏗️ Architecture Details

Property	Value
Model Type	echo_dsrn
Layers	8
Hidden Dim	512
Attention Heads	4
MLP Ratio	8.0
Vocab Size	32011
Hybrid Attention	True
RMSNorm	True

📊 Parameter Breakdown

Component	Parameters	% of Total
Total	114.69M (114,687,488)	100%
Embeddings	16.39M	14.29%
DSRN Blocks (Aggregate)	81.91M	71.42%
LM Head	16.39M	14.29%

🧩 Internal Block Structure (Per Layer)

Sub-Component	Parameters	Description
MLP (Feed-Forward)	4.20M	Upscaled hidden layers
DSRN Slow State	3.15M	Constant-time memory gates
GRU Fast State	1.58M	Recurrent fast path
Surprise Gating	264,192	Dynamic focus mechanism
Normalization	1,024	LayerNorm / RMSNorm

Benchmarks

📊 Absolute Multilingual Census

The following metrics represent a statistically absolute baseline, evaluated across the full 36,594-sample validation set using a deterministic greedy-decoding policy.

Locale	Accuracy	Matches / Total
🇺🇸 English (en-US)	79.39%	4,842 / 6,099
🇮🇹 Italian (it-IT)	73.49%	4,482 / 6,099
🇪🇸 Spanish (es-ES)	72.50%	4,422 / 6,099
🇫🇷 French (fr-FR)	72.16%	4,401 / 6,099
🇵🇹 Portuguese (pt-PT)	71.77%	4,377 / 6,099
🇩🇪 German (de-DE)	65.52%	3,996 / 6,099
---	---	---
🌍 OVERALL	72.47%	26,520 / 36,594

Training procedure

This LoRA adapter has been fine-tuned (SFT) on a single AMD Radeon™ AI PRO R9700 (32 GB RAM) by using the Flower Framework and TRL, in a simulated federated learning scenario.

Training Metrics

INFO :      aggregate_fit: received 2 results and 0 failures
INFO :      Communication budget: used 16716.43 MB (+278.61 MB this round) / 200,000 MB
Loading weights: 100%|██████████| 139/139 [00:00<00:00, 3686.31it/s, Materializing param=model.final_norm.weight]            
INFO :      fit progress: (30, 0.0, {}, 926.4947838319931)
INFO :      configure_evaluate: no clients selected, skipping evaluation
INFO :      
INFO :      [SUMMARY]
INFO :      Run finished 30 round(s) in 926.49s
...
INFO :      	History (metrics, distributed, fit):
INFO :      	{'entropy': [(1, 3.1767146694660187),
INFO :      	             (2, 2.6490936136245726),
INFO :      	             (3, 2.582885365486145),
INFO :      	             (4, 2.403850073814392),
INFO :      	             (5, 2.404714601635933),
INFO :      	             (6, 2.397827633917332),
INFO :      	             (7, 2.3419336032867433),
INFO :      	             (8, 2.330485168099403),
INFO :      	             (9, 2.2885197573900222),
INFO :      	             (10, 2.3625612980127335),
INFO :      	             (11, 2.2621049478650095),
INFO :      	             (12, 2.2685215598344803),
INFO :      	             (13, 2.193116867244244),
INFO :      	             (14, 2.16418510556221),
INFO :      	             (15, 2.1816292345523833),
INFO :      	             (16, 2.2237485074996948),
INFO :      	             (17, 2.240292007625103),
INFO :      	             (18, 2.1769691184163094),
INFO :      	             (19, 2.22042086571455),
INFO :      	             (20, 2.2185776421427725),
INFO :      	             (21, 2.18163181245327),
INFO :      	             (22, 2.15154937684536),
INFO :      	             (23, 2.175434983074665),
INFO :      	             (24, 2.160964986979961),
INFO :      	             (25, 2.158632977604866),
INFO :      	             (26, 2.1607184839248657),
INFO :      	             (27, 2.1532266357541086),
INFO :      	             (28, 2.1472932541370393),
INFO :      	             (29, 2.155655029118061),
INFO :      	             (30, 2.149040196239948)],
INFO :      	 'mean_token_accuracy': [(1, 0.7308300926908851),
INFO :      	                         (2, 0.8405319826304912),
INFO :      	                         (3, 0.8734206096827983),
INFO :      	                         (4, 0.8932938988506794),
INFO :      	                         (5, 0.922560573220253),
INFO :      	                         (6, 0.9258149369060993),
INFO :      	                         (7, 0.945394709855318),
INFO :      	                         (8, 0.9518885087966918),
INFO :      	                         (9, 0.9530007430911064),
INFO :      	                         (10, 0.9602652615308762),
INFO :      	                         (11, 0.9809961877763271),
INFO :      	                         (12, 0.9679327207803726),
INFO :      	                         (13, 0.9806139521300793),
INFO :      	                         (14, 0.986053352355957),
INFO :      	                         (15, 0.9900700397789478),
INFO :      	                         (16, 0.9693947829306125),
INFO :      	                         (17, 0.9722562806308269),
INFO :      	                         (18, 0.9861463868618011),
INFO :      	                         (19, 0.995743811428547),
INFO :      	                         (20, 0.9937835520505905),
INFO :      	                         (21, 0.9716044381260872),
INFO :      	                         (22, 0.999490964114666),
INFO :      	                         (23, 0.991388481259346),
INFO :      	                         (24, 0.9873159317672253),
INFO :      	                         (25, 0.985576259046793),
INFO :      	                         (26, 0.9877067220211029),
INFO :      	                         (27, 0.9989752702414989),
INFO :      	                         (28, 0.9998132897913456),
INFO :      	                         (29, 0.9981746172904968),
INFO :      	                         (30, 0.9881662499904632)],
INFO :      	 'train_loss': [(1, 1.1626044915243983),
INFO :      	                (2, 0.5194999669492245),
INFO :      	                (3, 0.40047831716015936),
INFO :      	                (4, 0.3398165641538799),
INFO :      	                (5, 0.24711219662043732),
INFO :      	                (6, 0.2427184192603454),
INFO :      	                (7, 0.17238803381100298),
INFO :      	                (8, 0.15867113580141448),
INFO :      	                (9, 0.15771765831450466),
INFO :      	                (10, 0.12902066930488218),
INFO :      	                (11, 0.06335343442275189),
INFO :      	                (12, 0.11254073319127202),
INFO :      	                (13, 0.06188895832223352),
INFO :      	                (14, 0.05105806810490321),
INFO :      	                (15, 0.03345653552742078),
INFO :      	                (16, 0.10704563170089386),
INFO :      	                (17, 0.09614553463034098),
INFO :      	                (18, 0.04905555162069503),
INFO :      	                (19, 0.016399495133809977),
INFO :      	                (20, 0.022009438261029572),
INFO :      	                (21, 0.10738116894423001),
INFO :      	                (22, 0.0036633076894486295),
INFO :      	                (23, 0.03189122898129426),
INFO :      	                (24, 0.043155781974319324),
INFO :      	                (25, 0.05212347019591107),
INFO :      	                (26, 0.04447804041239579),
INFO :      	                (27, 0.004012700952171144),
INFO :      	                (28, 0.0014076900614327314),
INFO :      	                (29, 0.006376274699809983),
INFO :      	                (30, 0.04514436764531638)]}
INFO :

Framework versions

TRL: 1.1.0
Transformers: 5.2.0
Pytorch: 2.10.0+rocm7.1
Datasets: 4.8.4
Tokenizers: 0.22.2
Flwr: 1.28.0
Flwr-datasets: 0.6.0

Citation

If you use this model in your research, please cite it as follows:

@misc{Massimo Roberto Scamarcia, title={Echo-DSRN-114M: Surprise-Gated Dual-State Recurrent Architecture for Efficient Language Modeling and Classification}, DOI={10.5281/zenodo.19848279}, publisher={Zenodo}, author={Massimo Roberto Scamarcia} }

Downloads last month: 292

Model tree for ethicalabs/Echo-SmolTools-114M-Intent-PEFT

Base model

ethicalabs/Echo-DSRN-114M-v0.1.2-Base

Finetuned

ethicalabs/Echo-DSRN-114M-v0.1.2

Adapter

(3)

this model

Dataset used to train ethicalabs/Echo-SmolTools-114M-Intent-PEFT

Spaces using ethicalabs/Echo-SmolTools-114M-Intent-PEFT 2

Collection including ethicalabs/Echo-SmolTools-114M-Intent-PEFT

Echo-DSRN

Collection

Dual-State Recurrent Neural Network • 10 items • Updated 12 days ago