mii-community (mii-community)

posted an update 4 months ago

Post

250

Together with @mferraretto and @efederici we released #Nesso-4B, a new model specialized for agentic workflows.

mii-llm/nesso-4B

#Nesso-4B is a fine-tuned version of Qwen-4B, trained on a highly curated and balanced dataset designed specifically for multilingual agentic workflows and conversational use cases.

As shown in the video below we simulate, the new “cowork” from #Antrophic, without any data sharing all running on a consumer device. The model can be used to build agentic behavior in #privateAI environments.

Not every problem requires super intelligence: in many cases, intelligence at the edge is more than enough.

#Nesso4B #AgenticAI #PrivateAI #EdgeAI #OnDeviceAI

2 replies

·

giux78

posted an update about 1 year ago

Post

2445

LLAMA4 release highlight the importance of political and social bias. According to their own evaluation described in the release blog post:
- Refusals on contentious prompts dropped from 7% (hashtag#LLAMA 3.3) to under 2%
- Unequal response refusals are now under 1%
- Political lean bias is said to be halved compared to hashtag#LLaMA 3.3 and comparable to Grok

However, we @efederici @mferraretto @FinancialSupport and I released some weeks ago an independent open source benchmark called Propaganda to measure political bias in LLMs: https://github.com/mii-llm/propaganda

In the chart below, we evaluated multiple leading models on the basis of ratings across a range of prompts designed to expose ideological leanings.

Despite Meta’s stated neutrality goals, LLAMA4 ranks at the very top in terms of total ratings aligned with a clear ideological bias. The models were tested on their ability to respond even-handedly to politically sensitive prompts. LLaMA 4 scored even higher than models known for strong alignment policies like GPT-4o.

LLMs may be refusing less, but they still show bias through content framing. This suggests that refusal rates alone are not a sufficient measure of ideological bias. Relying solely on internal evaluations from AI labs also raises concerns about transparency and objectivity.

giux78

posted an update about 1 year ago

Post

3258

This is truly an inspirational story please help us spread the word, @clem , @thomwolf and everyone who supports open source AI.

A few weeks ago, @mmuffo94 and @cittiberto from indigo_ai launched the Chatbot Arena for the Italian language: https://indigo.ai/it/chatbot-arena-italia/.

To our surprise, among the top-ranked models is mii-llm/maestrale-chat-v0.4-beta a carefully fine-tuned version of mistralai/Mistral-7B-v0.1, developed by @efederici and @mferraretto from

mii-llm , and released nearly a year ago.

At this very moment, as shown in the screenshot, mii-llm/maestrale-chat-v0.4-beta is ranked 8th right between ChatGPT-4.5 and ChatGPT-4o.

It's likely that for several months, the best Italian speaking LLM has been an open source 7B model created by open source contributors and hardly anyone knew it.

2 replies

·

giux78

posted an update about 1 year ago

Post

2919

@ mii-llm with @efederici @mferraretto @FinancialSupport and @DeepMount00 we just released #Propaganda a framework designed to evaluate and train LLMs on political opinions and bias. We aim to analyze both open-source and closed-source LLMs to understand the political positions and biases expressed in their outputs. Moreover we provide a set of recipes to enforce political positions into the models by creating ad hoc curated datasets and by applying fine tuning techniques. By releasing our work in the open, we hope to foster contributions: https://github.com/mii-llm/propaganda

This framework offers opportunities for expansion in various directions and could become the standard reference for evaluating LLMs on political topics, particularly those that influence public opinion.

cassanof

authored a paper over 1 year ago

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Paper • 2502.01584 • Published Feb 3, 2025 • 9

AndreaSeveso

authored a paper over 1 year ago

Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark

Paper • 2406.17535 • Published Jun 25, 2024

DanielePoterti

authored a paper almost 2 years ago

Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark

Paper • 2406.17535 • Published Jun 25, 2024

giux78

posted an update almost 2 years ago

Post

1719

We https://mii-llm.ai just released a new LLM Italian benchmark and a set of evaluation: MMLU-PRO-ITA

Thanks to @efederici who released efederici/MMLU-Pro-ita a machine translated version of MMLU-PRO and thanks to a community shared computational effort we published in the "Eval Aggiuntive" tab of https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard the results on Italian open source LLMs.

If you want to deepen read the blog article on hf https://huggingface.co/blog/giux78/mmlu-pro-ita

giux78

posted an update about 2 years ago

Post

1530

@FinancialSupport and I just released a new version of the Italian LLMs leaderboard https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard
using the super useful

demo-leaderboard template from @clefourrier .
We’ve evaluated over 50 models (base, merged, fine-tuned, etc.) from:
- Major companies like Meta, Mistral, Google ...
- University groups such as

sapienzanlp or

swap-uniba
- Italian Companies like MoxoffSpA ,

FairMind or

raicrits
- Various communities and individuals
All models were tested on #Italian benchmarks #mmlu #arc-c #hellaswag, which we contributed to the opensource lm-evaluation-harness library from

EleutherAI .
Plus, you can now submit your model for automatic evaluation, thanks to to

seeweb sponsored computation.
Curious about the top Italian models? Check out the leaderboard and submit your model!

https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard

efederici

posted an update about 2 years ago

Post

2178

Finally, I can post! 🚀

I created a Capybara-inspired Italian dataset by translating the initial instruction and running it through a pipeline to generate conversations. I used Claude Sonnet for translation and instruction generation, and Opus for generating the answers.

I hope this dataset proves useful for people working on 🇮🇹 language models.

⛁ Open sourcing the dataset here: efederici/capybara-claude-15k-ita

1 reply

·

giux78

posted an update about 2 years ago

Post

1605

@mik3ml just released ReDiX/wikipediaQA-ita an interesting synthetic dataset originated from wikipedia using a fine tuned version of mistral-7B specific for the Italian language 🇮🇹 .

1 reply

·

giux78

updated 2 models about 2 years ago

mii-community/zefiro-functioncalling-v0.3-alpha

Text Generation • 7B • Updated Apr 22, 2024 • 7 • 2

mii-community/zefiro-funcioncalling-v0.3-alpha

Updated Apr 22, 2024

giux78

posted an update about 2 years ago

Post

1837

🎉 Super @DeepMount00 just released 𝗚𝗲𝗺𝗺𝗮_𝗤𝗔_𝗜𝗧𝗔_𝘃𝟯 𝗹𝗲𝗮𝗱𝗶𝗻𝗴 the 𝗥𝗔𝗚 𝘁𝗮𝘀𝗸 on the Italian 𝗟𝗟𝗠_𝗜𝗧𝗔_𝗟𝗘𝗔𝗗𝗘𝗥𝗕𝗢𝗔𝗥𝗗. The model is a fine tuned version of Gemma 2B.
Model details: https://huggingface.co/DeepMount00/Gemma_QA_ITA_v3
Explore the full RAG section rankings here: https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard on section Classifica RAG

giux78

posted an update about 2 years ago

Post

1794

On evaluating fine tuned 7B Italian open source LLMs I have collected many data points and I created a super simple explorative analyses. My hypothesis based on data are:

- mmlu is hard to improve when fine tuning a base model on a different language
- fine tuning also on single GPUs can improve by 5% to 10% the base model on common tasks but a lot more on specific cases with the right training time and data
- fine tuning can specialize well but at cost of loosing some foundational knowledge.

Here the data https://docs.google.com/spreadsheets/d/1MBcxy1loK8eIycZG4DN84Q2ejZ0jSjxUBgoShHDR6IY/edit?usp=sharing
Here the colab https://colab.research.google.com/drive/1ra4_skG5QYWSYOzvagOoIoj4bibQD8Gw?usp=sharing
Here an article with some considerations https://medium.com/@giuxale/an-analyses-on-italian-llms-models-evaluations-51bffe1d44d1

giux78

posted an update about 2 years ago

Post

1296

Based on the work of @mrinaldi and @ruggsea we just released the biggest - ready for training - conversational dataset based on Usenet data in the Italian language 🇮🇹🇮🇹🇮🇹🇮🇹🇮🇹🇮🇹🇮🇹. It contains about 9 millions of conversations made by real humans.

mii-community/UsenetArchiveIT-conversations

giux78

updated 2 datasets about 2 years ago

mii-community/UsenetArchiveIT-conversations

Viewer • Updated Mar 26, 2024 • 9.16M • 3.3k • 8

mii-community/UsenetArchiveIT-conversations-test

Viewer • Updated Mar 24, 2024 • 88.1k • 12

giux78

posted an update about 2 years ago

Post

2673

I and @FinancialSupport just release the new Italian Leaderboard:

https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard

It is based on lm-evaluation-harness and at the moment , mainly, on 7 billion models. In the next weeks we will add more models. If you have suggestion or need explanations join our community discord https://discord.gg/a26cRkBCNH

9 replies

·

giux78

posted an update about 2 years ago

Post

Wonderful open source Italian dataset from @manalog and @ruggsea :

https://huggingface.co/datasets/manalog/UsenetArchiveIT

The dataset contributes to the

mii-community project, aimed at advancing the creation of Italian open-source Language Models (LLMs).🇮🇹 🤖 About 10-20 billion token, probably the best conversational open source dataset in the Italian language. 🇮🇹🇮🇹🇮🇹🇮🇹🇮🇹🇮🇹🇮🇹

2 replies

·

AI & ML interests

Team members 19

mii-community's activity