Classic OpenAI: Overhyped and censored by OpenAI
Why was the only normal discussion here closed? Here
By the way, some guy wrote code to use this model not for filtering, but for searching for confidential information. Link
Also, this is just an ordinary classifier...
Why does it have more likes than DeepSeek V4 Flash? Are you actually going to use this model? Does it filter medical data? Legal data? Or anything else at all?
Let’s imagine I want to publish a dataset of my chats (many people in the OSS community praise it like this, calling it a "gift").
But this gift only works in English... Well, some gift.
I won't even mention that anyone can train this model at home. And by "train" I don't mean fine-tuning — I mean training a similar model from scratch on public datasets.
Datasets for this exist: https://huggingface.co/datasets?sort=trending&search=PII
Models for this already exist: https://huggingface.co/models?sort=trending&search=PII
Nvidia's model is more powerful and even multilingual, and it was originally trained as a classifier. You can try it here.
But OpenAI got the most attention, and I think in this case it's purely because of the name. If they wanted to release a model so that people could publish their chats from different companies and conceal their personal data, they would have stated that and encouraged people to do so. But in reality, this is just one of many OpenAI models that was sitting in their archive.
you were given something for free and yet you complain. all you have to do is not use this model, it's not like its existence hurts you :)
you were given something for free and yet you complain. all you have to do is not use this model, it's not like its existence hurts you :)
What did I get for free? Only a more modern architecture than other solutions, and increased awareness (among both the community and people far from AI) of text classifiers, including specifically for this use case. For that alone, OpenAI deserves praise.
Or did I get for free the closure of a discussion that was inconvenient for OpenAI, plus a model that's half-baked (few classes, no multilingual support)? Yes, fine-tuning will partially solve these issues, but it's not a silver bullet. What's the problem with releasing a proper model from the start instead of making the community finish it? By the way, creating fine-tunes for a model probably boosts its trending status on Hugging Face — and there are already plenty of them. Almost every language has its own fine-tune. Just great. GPT-OSS was originally bad on languages other than English.
Why is such a model being released at all, and why did OpenAI want to train and publish it? I already speculated earlier that it's just some dusty model sitting in their archive — but why would OpenAI train this model on GPT-OSS for their own use? They 1000% have a much better model.
There was news that Mark Zuckerberg wants to train AI on data from his employees. Applying a model like openai/privacy-filter directly on the user's computer fits this scenario perfectly. Plus, companies could play this up: in applications (e.g., for agentic AI use), they could offer to enable data collection for improving their core models in exchange for extra perks. Models like openai/privacy-filter are quite suitable for this — they give users a guarantee that their data will be anonymized, and on top of that, the computation runs on the user's side, not on the company's AI.
@evewashere Actually to the contrary, every second OpenAI still functions as a company, my body is in physical pain. Because of how cringe they are
Folks, let's apply the principle of "assume the best intent" and work together to make things better, rather than discussing negatives, hypotheticals, exaggerations. In the end, that's why the other discussion topic was closed.
Our goal is to show a model that is small, can be used for PII detection with a set of classes. We know it can be improved, and we released it to the world to see what can be built on top of it and what are the areas to improve. In true Open Source spirit, contributions to make this amazing are welcome
Given that this discussion also has derailed, we might need to close it too.
@mihaimaruseac The problem isn't this model specifically. It's the terrible practices of the company, not to mention the lying CEO who's basically ruined an entire economy. What the model does however is show the absolute irony in its release from this specific company, while the leadership hasnt changed, and in fact gotten worse over time.
That's why we are taking our frustrations out anywhere there's an outlet for them.
Given that this discussion also has derailed, we might need to close it too.
CensorAI
Point me to a company that does this.
Here's an example of one that doesn't: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/discussions/131
You even removed 4o just because it was causing problems for the company. Now on Sam Altman's X, under every post, people are still begging for the model weights to be released. How many talks were there about an adult mode? And where is it? It would have caused the company even more problems than 4o, so they dropped it.
I love Image 2 from GPT, but I don't pay a cent for it, and beyond that, ChatGPT is simply pointless.
By the way, ChatGPT itself generated all of this. I don't mean to personally offend anyone — except maybe OpenAI's CEO. As for what mihaimaruseac generated as a rat — those are questions for ChatGPT itself.
None of these discussions are relevant for this model. Please, let's keep the discussion focused to privacy-filter.
We are just one team in a larger company
None of these discussions are relevant for this model.
Pure lies.
We are just one team in a larger company
But the methods are the same. No company closes all discussions on HF.
No company closes all discussions on HF.
They do, see example https://huggingface.co/nvidia/nemotron-ocr-v2/discussions/3


