appvoid (appvoid)

posted an update 26 days ago

Post

137

Yesterday someone faked an anthropic account: https://huggingface.co/Anthropic-ai/claude
Be careful... all I'm saying.

1 reply

·

replied to their post about 2 months ago

I guess the reason is slow is because llama.cpp is not optimized...

replied to their post 2 months ago

Correct! It's causal modeling (for now) with a char level tokenizer with only 8 tokens.

The model learns by looking for relationships of sequences for a single token, so the only way it learns is literally nudging weights towards a generalized solution using pure sequences.

In short, it learns to learn.

Will the be any app to.. convert the dots to something meaningful?

Not yet, I'm focusing on getting the core right first. But once the model is general enough, I don't see why not. Though you might need to finetune it for your use case.

replied to their post 2 months ago

Correct! It's causal modeling (for now) with a char level tokenizer with only 8 tokens.

The model learns by looking for relationships of sequences for a single token, so the only way it learns is literally nudging weights towards a generalized solution using pure sequences.

In short, it learns to learn.

posted an update 2 months ago

Post

2516

Let's keep the momentum for small models. I just published dot. It's the first pretrained causal model that is trained on math/symbols rather than english. The goal is to get an agnostic fewshot meta learner that learns from reality itself instead of language.

It's already decent at some tasks, with next version coming in a few weeks.

appvoid/dot

5 replies

·

replied to their post 2 months ago

The first model proudly trained from scratch on "physical" reasoning instead of chunky language tokens was published.

posted an update 2 months ago

Post

243

Are you ready for some ●s? Tomorrow will be a good day.

4 replies

·

replied to their post 2 months ago

if you need raw power though slow, rwkv 0.4b has you covered, if you need something in between choose lfm2 350m

posted an update 2 months ago

Post

942

granite-4.0-350m, rwkv7-g1d-0.4b and LFM2-350M are currently the best sub 0.5b models currently for fewshot, simple language tasks

no one is saying this:

if you need the absolute speed + small size + quality, granite 350m is the current king

3 replies

·

replied to sourceoftruthdata's post 6 months ago

indeed 🤝

reacted to sourceoftruthdata's post with 🤝 6 months ago

Post

3459

What a fantastic community!

1 reply

·

posted an update 6 months ago

Post

330

What's the best model out there below 700m parameters that is good at few shot tasks? I want to test it against arco-3

posted an update 7 months ago

Post

279

Introducing arco-3

The first project, as far as I know, that focuses purely on few-shot prompting results rather than zero-shot like usually done with decoder-only transformer models. This model excels at few-shot tasks compared to most 0.6b and even bigger models. It also outperforms the base model on some popular language modeling benchmarks.

appvoid/arco-3

Try it yourself!

replied to their post 7 months ago

Do you have your raspberry pis and phones ready for this new model yet?

New model, new architecture, more power:

Since I'm unable to post for about 11 hours, I will post it here: https://huggingface.co/appvoid/arco-3

posted an update 7 months ago

Post

4119

today is going to be a great day for small models, are you ready?

3 replies

·

replied to m-ric's post 7 months ago

the issue is, we are still to find how to apply that to language space

reacted to KnutJaegersberg's post with 🚀 8 months ago

Post

1184

What's missing for AGI

Current transformer-based, self-supervised systems have driven massive gains, but important gaps remain on the path to AGI. Key missing pieces are continual, curiosity-driven learning; grounded multimodal perception; reliable, contextual long-term memory with forgetting; motivated (hot) executive control and dynamic attention; metacognition and coherent causal world-models; and robust fluid reasoning, planning and decision-making. Progress will require hybrid architectures (neuromorphic/Hebbian + gradients + symbolic modules), active-inference and intrinsic-motivation objectives, and new lifelong, embodied benchmarks to evaluate safety and competence.

https://huggingface.co/blog/KnutJaegersberg/whats-missing-for-agi-in-todays-tech-trajectories

replied to their post 9 months ago

all i can say to this question is i don't know, maybe it could rapidly developed into an asi and if the amount of compute for a super intelligence ends up being at human brain level or even a little more than that, then is easier to picture the implications

in you hypothetical scenario, there could be rich people that could become criminals with such power btw, corruption is universal and the power of knowing non-obvious things of reality itself due to lack of a higher intelligence can be pointed towards the masses

replied to their post 9 months ago

obviously gpt 2 was the kickstarter for openai but they didn't actually know the power of gpt4 when they created gpt 1

same thing could happen with reasoning, it might or might not have bigger implications, who knows

replied to their post 9 months ago

good point, if someone creates an ai that extrapolates to any dataset then might just make science quickly than the average damage bad guys cause

appvoid

AI & ML interests

Recent Activity

Organizations

appvoid

AI & ML interests

Recent Activity

Organizations

appvoid's activity