Model Card for ethicalabs/Echo-SmolTools-114M-Intent-PEFT
The Echo-SmolTools-114M-Intent-PEFT is a a LoRA-based adapter trained over the Echo-DSRN-114M-v0.1.2 base RNN architecture, optimized as multilingual intentionality classifier.
Gradio App now available: ๐๏ธ Echo Intent: Multilingual Intent Classifier
Model Usage
You can load the fine-tuned PEFT adapter over the base model and run intent classification inference as follows:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the base model and tokenizer
base_model_name = "ethicalabs/Echo-DSRN-114M-v0.1.2"
base_model = AutoModelForCausalLM.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
# Load the fine-tuned PEFT adapter (example: Round 28)
peft_model_name = "ethicalabs/Echo-SmolTools-114M-Intent-PEFT"
model = PeftModel.from_pretrained(base_model, peft_model_name, trust_remote_code=True)
# Prepare the prompt for multilingual intent classification
utt = "Che ore sono a Roma?" # "What time is it in Rome?"
messages = [
{"role": "system", "content": "You are a helpful multilingual intent classification assistant."},
{"role": "user", "content": f"Classify the intent of the following request: {utt}"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Tokenize and generate
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=15, do_sample=False)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()
print(f"User: {utt}")
print(f"Intent: {response}")
Output:
User: Che ore sono a Roma?
Intent: datetime_query
Base Model: ethicalabs/Echo-DSRN-114M-v0.1.2
๐๏ธ Architecture Details
| Property | Value |
|---|---|
| Model Type | echo_dsrn |
| Layers | 8 |
| Hidden Dim | 512 |
| Attention Heads | 4 |
| MLP Ratio | 8.0 |
| Vocab Size | 32011 |
| Hybrid Attention | True |
| RMSNorm | True |
๐ Parameter Breakdown
| Component | Parameters | % of Total |
|---|---|---|
| Total | 114.69M (114,687,488) | 100% |
| Embeddings | 16.39M | 14.29% |
| DSRN Blocks (Aggregate) | 81.91M | 71.42% |
| LM Head | 16.39M | 14.29% |
๐งฉ Internal Block Structure (Per Layer)
| Sub-Component | Parameters | Description |
|---|---|---|
| MLP (Feed-Forward) | 4.20M | Upscaled hidden layers |
| DSRN Slow State | 3.15M | Constant-time memory gates |
| GRU Fast State | 1.58M | Recurrent fast path |
| Surprise Gating | 264,192 | Dynamic focus mechanism |
| Normalization | 1,024 | LayerNorm / RMSNorm |
Benchmarks
๐ Absolute Multilingual Census
The following metrics represent a statistically absolute baseline, evaluated across the full 36,594-sample validation set using a deterministic greedy-decoding policy.
| Locale | Accuracy | Matches / Total |
|---|---|---|
| ๐บ๐ธ English (en-US) | 79.39% | 4,842 / 6,099 |
| ๐ฎ๐น Italian (it-IT) | 73.49% | 4,482 / 6,099 |
| ๐ช๐ธ Spanish (es-ES) | 72.50% | 4,422 / 6,099 |
| ๐ซ๐ท French (fr-FR) | 72.16% | 4,401 / 6,099 |
| ๐ต๐น Portuguese (pt-PT) | 71.77% | 4,377 / 6,099 |
| ๐ฉ๐ช German (de-DE) | 65.52% | 3,996 / 6,099 |
| --- | --- | --- |
| ๐ OVERALL | 72.47% | 26,520 / 36,594 |
Training procedure
This LoRA adapter has been fine-tuned (SFT) on a single AMD Radeonโข AI PRO R9700 (32 GB RAM) by using the Flower Framework and TRL, in a simulated federated learning scenario.
Training Metrics
INFO : aggregate_fit: received 2 results and 0 failures
INFO : Communication budget: used 16716.43 MB (+278.61 MB this round) / 200,000 MB
Loading weights: 100%|โโโโโโโโโโ| 139/139 [00:00<00:00, 3686.31it/s, Materializing param=model.final_norm.weight]
INFO : fit progress: (30, 0.0, {}, 926.4947838319931)
INFO : configure_evaluate: no clients selected, skipping evaluation
INFO :
INFO : [SUMMARY]
INFO : Run finished 30 round(s) in 926.49s
...
INFO : History (metrics, distributed, fit):
INFO : {'entropy': [(1, 3.1767146694660187),
INFO : (2, 2.6490936136245726),
INFO : (3, 2.582885365486145),
INFO : (4, 2.403850073814392),
INFO : (5, 2.404714601635933),
INFO : (6, 2.397827633917332),
INFO : (7, 2.3419336032867433),
INFO : (8, 2.330485168099403),
INFO : (9, 2.2885197573900222),
INFO : (10, 2.3625612980127335),
INFO : (11, 2.2621049478650095),
INFO : (12, 2.2685215598344803),
INFO : (13, 2.193116867244244),
INFO : (14, 2.16418510556221),
INFO : (15, 2.1816292345523833),
INFO : (16, 2.2237485074996948),
INFO : (17, 2.240292007625103),
INFO : (18, 2.1769691184163094),
INFO : (19, 2.22042086571455),
INFO : (20, 2.2185776421427725),
INFO : (21, 2.18163181245327),
INFO : (22, 2.15154937684536),
INFO : (23, 2.175434983074665),
INFO : (24, 2.160964986979961),
INFO : (25, 2.158632977604866),
INFO : (26, 2.1607184839248657),
INFO : (27, 2.1532266357541086),
INFO : (28, 2.1472932541370393),
INFO : (29, 2.155655029118061),
INFO : (30, 2.149040196239948)],
INFO : 'mean_token_accuracy': [(1, 0.7308300926908851),
INFO : (2, 0.8405319826304912),
INFO : (3, 0.8734206096827983),
INFO : (4, 0.8932938988506794),
INFO : (5, 0.922560573220253),
INFO : (6, 0.9258149369060993),
INFO : (7, 0.945394709855318),
INFO : (8, 0.9518885087966918),
INFO : (9, 0.9530007430911064),
INFO : (10, 0.9602652615308762),
INFO : (11, 0.9809961877763271),
INFO : (12, 0.9679327207803726),
INFO : (13, 0.9806139521300793),
INFO : (14, 0.986053352355957),
INFO : (15, 0.9900700397789478),
INFO : (16, 0.9693947829306125),
INFO : (17, 0.9722562806308269),
INFO : (18, 0.9861463868618011),
INFO : (19, 0.995743811428547),
INFO : (20, 0.9937835520505905),
INFO : (21, 0.9716044381260872),
INFO : (22, 0.999490964114666),
INFO : (23, 0.991388481259346),
INFO : (24, 0.9873159317672253),
INFO : (25, 0.985576259046793),
INFO : (26, 0.9877067220211029),
INFO : (27, 0.9989752702414989),
INFO : (28, 0.9998132897913456),
INFO : (29, 0.9981746172904968),
INFO : (30, 0.9881662499904632)],
INFO : 'train_loss': [(1, 1.1626044915243983),
INFO : (2, 0.5194999669492245),
INFO : (3, 0.40047831716015936),
INFO : (4, 0.3398165641538799),
INFO : (5, 0.24711219662043732),
INFO : (6, 0.2427184192603454),
INFO : (7, 0.17238803381100298),
INFO : (8, 0.15867113580141448),
INFO : (9, 0.15771765831450466),
INFO : (10, 0.12902066930488218),
INFO : (11, 0.06335343442275189),
INFO : (12, 0.11254073319127202),
INFO : (13, 0.06188895832223352),
INFO : (14, 0.05105806810490321),
INFO : (15, 0.03345653552742078),
INFO : (16, 0.10704563170089386),
INFO : (17, 0.09614553463034098),
INFO : (18, 0.04905555162069503),
INFO : (19, 0.016399495133809977),
INFO : (20, 0.022009438261029572),
INFO : (21, 0.10738116894423001),
INFO : (22, 0.0036633076894486295),
INFO : (23, 0.03189122898129426),
INFO : (24, 0.043155781974319324),
INFO : (25, 0.05212347019591107),
INFO : (26, 0.04447804041239579),
INFO : (27, 0.004012700952171144),
INFO : (28, 0.0014076900614327314),
INFO : (29, 0.006376274699809983),
INFO : (30, 0.04514436764531638)]}
INFO :
Framework versions
- TRL: 1.1.0
- Transformers: 5.2.0
- Pytorch: 2.10.0+rocm7.1
- Datasets: 4.8.4
- Tokenizers: 0.22.2
- Flwr: 1.28.0
- Flwr-datasets: 0.6.0
Citation
If you use this model in your research, please cite it as follows:
@misc{Massimo Roberto Scamarcia, title={Echo-DSRN-114M: Surprise-Gated Dual-State Recurrent Architecture for Efficient Language Modeling and Classification}, DOI={10.5281/zenodo.19848279}, publisher={Zenodo}, author={Massimo Roberto Scamarcia} }
- Downloads last month
- 292
Model tree for ethicalabs/Echo-SmolTools-114M-Intent-PEFT
Base model
ethicalabs/Echo-DSRN-114M-v0.1.2-Base