Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
unmodeled-tyler 
posted an update 23 days ago
Post
3013
Just started a fun project!

unmodeled-tyler/DoW-UFO-UAP-1

I'm getting the recently released DoW UFO/UAP documents (https://war.gov/ufo) cleaned and converted into a dataset here on Hugging Face!

There 161 different files in the gov release (pdfs, images, videos, audio, etc) and my current plan is to do it all in 1 dataset with 4 different shards - that way you can just call whichever tables you want/need when you import the dataset.

This is an ongoing project (I'm doing it on the side + my regular projects) so it's a bit of a growing entity. I'll also continuously refine the data over time to make sure it's as clean as possible.

Check it out! Who knows what you'll find in there?

A thought keeps occurring to me throughout this project so far:

"What happens to declassified data when the public suddenly has tools powerful enough to actually understand it?"

For instance, these types of record releases are not new - the US Government has been declassifying and releasing records since its inception. What IS new is the average person's ability to parse, synthesize, and make connections that may not have been previously possible by a single human - particularly in tandem with frontier AI that can complete work that before would have required a specialized research team.

A single file within 161 total file release could contain upwards of 300 different pages. All of those pages are more or less loosely in a logical order - but not necessarily. The released records follow the same archival conventions that are used internally by the government. It is by its very nature, "not for public consumption." It is not made to be publicly comprehensible. If you aren't familiar with those systems, you're cooked.

What access to AI systems allows us to do is take those previously muddy, incomprehensible (but very valuable!) releases and make them legible to the average person.

If then, this data becomes comprehensible to more people, we now have even more humans thinking about the very strange problems that may lie buried deep inside - maybe some of those humans will make a connection that was missed the first time? What if that previously missed connection leads to a groundbreaking discovery?

Either way, this is bigger than UFO/UAP documents. It applies to any public data that has been available IN THEORY but inaccessible in practice.

Happy building!

·

I just constantly have the feeling that people haven't figured it out yet -- but I also realize I'm in a highly niche area. But it does seem like.. I mean.. so many things are so much easier now because of the tools we use -- but how many people even know these tools exist in the larger scheme of things?

For example, I noticed this headline today from CNBC:

Anthropic’s Mythos set off a cybersecurity ‘hysteria.’ Experts say the threat was already here

But anybody who knows anything realizes that you haven't had to wait for Mythos to build a decent harness around tons of different uncensored models to do even more. It's just that the layman is currently catching up, I feel.

We live in an AI bubble I think.. not the kind that is going to 'pop' and destroy the economy, but the kind of bubble where if you're really good at what you already do in a research / academic sense, you're going to be unstoppable with the current AI tools. And they just keep getting better.

Excellent thought provoking post! Have a good weekend!