A collection with text-classification and token-classification models for PII Protection
Alvaro Bartolome
AI & ML interests
machine learning + tech lead @huggingface (inference + cloud)
Recent Activity
updated a dataset about 16 hours ago
huggingface/DEH-image-scan-data updated a dataset 4 days ago
alvarobartt/hf-memOrganizations
Critique Models (CM) on the π€ Hub
This collection contains some Critique Models (CM) for LLM evaluation available in the HuggingFace Hub
-
openbmb/UltraCM-13b
Text Generation β’ Updated β’ 41 β’ 20 -
prometheus-eval/prometheus-7b-v1.0
Text Generation β’ Updated β’ 388 β’ 31 -
prometheus-eval/prometheus-13b-v1.0
Text Generation β’ Updated β’ 854 β’ 145 -
prometheus-eval/prometheus-7b-v2.0
Text Generation β’ 7B β’ Updated β’ 44.9k β’ β’ 106
AIF Datasets (with distilabel)
Small to medium size datasets either: synthetically generated, labelled with AI Feedback (AIF), or both
NER in Spanish
Fine-tuned models to perform NER in Spanish using the framework SpanMarker and different encoders and datasets
-
alvarobartt/bert-base-multilingual-cased-ner-spanish
Token Classification β’ 0.2B β’ Updated β’ 19 β’ 3 -
alvarobartt/span-marker-xlm-roberta-large-conll-2002-es
Token Classification β’ Updated β’ 10 β’ 2 -
alvarobartt/span-marker-roberta-base-bne-conll-2002-es
Token Classification β’ Updated β’ 12 β’ 1
From zero to GPT-hero
Reading list to fully understand GPT (and GPT-2) and to be able to implement those from scratch
-
Neural Machine Translation of Rare Words with Subword Units
Paper β’ 1508.07909 β’ Published β’ 4 -
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 122 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 27 -
Generating Wikipedia by Summarizing Long Sequences
Paper β’ 1801.10198 β’ Published β’ 3
Studio Ghibli Diffusion
Text-To-Image fine-tunes with Studio Ghibli style
- Running on ZeroAgents23
FLUX.1 Studio Ghibli LoRA
πΌ23Generate Studio Ghibli-style images from text prompts
-
alvarobartt/ghibli-characters
Viewer β’ Updated β’ 9 β’ 90 β’ 9 -
black-forest-labs/FLUX.1-dev
Text-to-Image β’ Updated β’ 746k β’ β’ 12.8k -
alvarobartt/ghibli-characters-flux-lora
Text-to-Image β’ Updated β’ 327 β’ β’ 64
About ORPO
Contains some information and experiments fine-tuning LLMs using π€ `trl.ORPOTrainer`
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper β’ 2403.07691 β’ Published β’ 72 -
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
Text Generation β’ 141B β’ Updated β’ 150 β’ 269 -
alvarobartt/mistral-orpo-mix
Text Generation β’ 7B β’ Updated β’ 8 β’ 1 -
alvarobartt/Mistral-7B-v0.1-ORPO
Text Generation β’ 7B β’ Updated β’ 5 β’ 14
Apple MLX-compatible 7B LLMs on the π€ Hub
This collection contains the model weights for 7B LLMs for Apple's MLX framework. Find more information at https://github.com/ml-explore/mlx
πͺπΈ Datasets in Spanish for LLM Evaluation
This collection contains some datasets for LLM evaluation in Spanish, from nlp.uoregon.edu, translated using ChatGPT (including English counterparts)
π Models for PII Protection
A collection with text-classification and token-classification models for PII Protection
Studio Ghibli Diffusion
Text-To-Image fine-tunes with Studio Ghibli style
- Running on ZeroAgents23
FLUX.1 Studio Ghibli LoRA
πΌ23Generate Studio Ghibli-style images from text prompts
-
alvarobartt/ghibli-characters
Viewer β’ Updated β’ 9 β’ 90 β’ 9 -
black-forest-labs/FLUX.1-dev
Text-to-Image β’ Updated β’ 746k β’ β’ 12.8k -
alvarobartt/ghibli-characters-flux-lora
Text-to-Image β’ Updated β’ 327 β’ β’ 64
Critique Models (CM) on the π€ Hub
This collection contains some Critique Models (CM) for LLM evaluation available in the HuggingFace Hub
-
openbmb/UltraCM-13b
Text Generation β’ Updated β’ 41 β’ 20 -
prometheus-eval/prometheus-7b-v1.0
Text Generation β’ Updated β’ 388 β’ 31 -
prometheus-eval/prometheus-13b-v1.0
Text Generation β’ Updated β’ 854 β’ 145 -
prometheus-eval/prometheus-7b-v2.0
Text Generation β’ 7B β’ Updated β’ 44.9k β’ β’ 106
About ORPO
Contains some information and experiments fine-tuning LLMs using π€ `trl.ORPOTrainer`
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper β’ 2403.07691 β’ Published β’ 72 -
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
Text Generation β’ 141B β’ Updated β’ 150 β’ 269 -
alvarobartt/mistral-orpo-mix
Text Generation β’ 7B β’ Updated β’ 8 β’ 1 -
alvarobartt/Mistral-7B-v0.1-ORPO
Text Generation β’ 7B β’ Updated β’ 5 β’ 14
AIF Datasets (with distilabel)
Small to medium size datasets either: synthetically generated, labelled with AI Feedback (AIF), or both
Apple MLX-compatible 7B LLMs on the π€ Hub
This collection contains the model weights for 7B LLMs for Apple's MLX framework. Find more information at https://github.com/ml-explore/mlx
NER in Spanish
Fine-tuned models to perform NER in Spanish using the framework SpanMarker and different encoders and datasets
-
alvarobartt/bert-base-multilingual-cased-ner-spanish
Token Classification β’ 0.2B β’ Updated β’ 19 β’ 3 -
alvarobartt/span-marker-xlm-roberta-large-conll-2002-es
Token Classification β’ Updated β’ 10 β’ 2 -
alvarobartt/span-marker-roberta-base-bne-conll-2002-es
Token Classification β’ Updated β’ 12 β’ 1
πͺπΈ Datasets in Spanish for LLM Evaluation
This collection contains some datasets for LLM evaluation in Spanish, from nlp.uoregon.edu, translated using ChatGPT (including English counterparts)
From zero to GPT-hero
Reading list to fully understand GPT (and GPT-2) and to be able to implement those from scratch
-
Neural Machine Translation of Rare Words with Subword Units
Paper β’ 1508.07909 β’ Published β’ 4 -
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 122 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 27 -
Generating Wikipedia by Summarizing Long Sequences
Paper β’ 1801.10198 β’ Published β’ 3