Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
MikeDoes 
posted an update about 9 hours ago
Post
109
What happens when an LLM "forgets" your data? A new paper reports it might not be gone for good.

The "Janus Interface" paper details a new attack that could recover forgotten PII through fine-tuning APIs. This is a solution-oriented paper because it highlights a problem that needs fixing.

Testing such a high-stakes attack requires equally high-stakes data. The Ai4Privacy 300k dataset was a key part of their evaluation, providing a testbed for extracting sensitive Social Security Numbers. Our dataset, with its synthetic structured SSN data, helped the researchers at Indiana University, Stanford & CISPA, and others demonstrate that their attack works on more than just emails. It could affect highly sensitive personal identifiers.

We're excited to see our open-source dataset used in such cutting-edge security research. It's a win for the community when researchers can use our resources to stress-test the safety of modern AI systems. This work is a direct and explicit call for stronger protections on fine-tuning interfaces.

🔗 This is why open data for security research is so important. Check out the full paper: https://arxiv.org/pdf/2310.15469

🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/

When does the planned context become the signifier of that context in the code itself? When something is stable in code. Even having to recover, or being able to, means it's storing far too much about context without getting to the context itself. All language needs the same simplification. Or maybe I just don't see reflexivity in AI yet. Maybe I don't see it building itself with awareness of what it is to others, unlike NASNet.