AI & ML interests

Computer Vision Technology and Data Collection for Anime Waifu

Recent Activity

eienmojiki 
posted an update 3 days ago
AbstractPhil 
posted an update 3 days ago
view post
Post
74
Anima - Brent JSON (PREVIEW) - Subject Bucketing

Full article available https://huggingface.co/blog/AbstractPhil/subject-bucketing.

There is additionally a civit model release as well.
https://civitai.com/models/2730503/anima-jsonenglish

AbstractPhil/anima-prelim-1k-r64
The JSON multi-prompt diffusion model prototype using Anima 1.0 base as the pretrain to finetune into the JSON target. The upcoming JSON lora is being cached and trained with 40,000 of the full 83,000 valid images from the qwen set.

This first preview version is ready to use as a ComfyUI capable LORA, so you can just load up the epoch you want without anything special in comfyui and have at it. You can currently use plain English in conjunction with tagging to produce useful and meaningful prompt targets without the JSON.

AbstractPhil/anima-prelim-1k-r64
The comfyui nodes are present and work for testing use-case, but they are not ready for production use just yet.

-- Technical --
Primarily the target was the VLM json target followed by the AnimeTIMM vit processed through the VLM json processor as the followup. First 12 epochs VLM experienced images with json formatting, last 8 epochs were finetuning from epoch 12 onward to 20 using the AnimeTIMM captions turned into JSON instead.

The Anima model itself accepted the 1000 image and the json prompting works quite well. In the process I set up a couple comfyui nodes that can translate base prompts into the same language the model is learning. Those are present in the repo.
  • 1 reply
·
AbstractPhil 
posted an update 11 days ago
view post
Post
153
The article for aleph attention routing needs more work on vision, as the vision portion has not been fully validated, while the LM prototype has been semi-validated for small and medium-small scale. I will post my findings in the coming days with the consequences of training an LM and a VIT utilizing the prototype system.

The current structure for the Geometric Vocabulary does nearly reflect the intended shape as discussed in the earlier posts and articles, so that's coming along nicely - but there are stipulations and problems involved that I did not foresee.

My apologies for the incomplete article I just released on a whim. I jumped to the conclusion a bit early in anticipation before the formulas were fully converged. I also released an early post the other day speaking about the prototype AlephLM - which I removed as an invalid conclusion.

I'm doing my best to only release validated empirical information instead of speculative - however I do sometimes jump to conclusions without proper validation from time to time. Occasionally, I get a bit theory-overzealous and require tidying up through thorough experimentation which I'm currently approaching directly.
prithivMLmods 
posted an update 13 days ago
view post
Post
4334
Wan2.2-I2V-Fast with highly upscaled sequential frame sampling is now available as a Spaces demo, built using Wan2.2-I2V and FLUX.2-Klein. Try the demo using the links below.👇

➠ wan2.2-i2v-fast : prithivMLmods/wan2.2-i2v-fast
➠ github: https://github.com/prithivsakthiur/wan2.2-i2v-fast
➠ collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

⤷ To learn more, visit the app page or the respective model pages.
AbstractPhil 
posted an update 16 days ago
view post
Post
114
Claude Fable 5 was temp/perma? banned for security reasons.

Working with Fable I have to say the model is capable at handling highly complex geometric mathematics ACTUALLY to the point of me getting some work done without a headache. I hope Fable returns soon so I can finish cobbling without a headache and a week per prototype again.

During Fable's existence I managed to cobble together a multi-series aleph paradigm that can handle direct implicit and explicit learning for an LM with a trigram context window. This essentially provides expert directional utilization based on a stable codebook without requiring expert distillation into singular experts and duplicated.

Details soon. There are over 20 functional formula prototypes and around 8 potential heads that all lead to the same outcome, the math is rock solid - each with their own benefits and downsides based on the assigned text tasks.
AbstractPhil 
posted an update 27 days ago
view post
Post
135
The first large scale distillation is coming using the geolip-aleph-void architecture as the mathematical aleph procrustes geofractal addressed language latent.

In short, a single geometric patchwork vocabulary chunk. Which ironically needs chunking to properly prepare.

The address structure I have been meticulously refining is about to show it's genuine distillation muscle.

This is heavily due to the discovery and refinement of a specific logit I've named an aleph logit. This logit is baked clean into the architecture with the void-based codebook, and is available for review https://github.com/AbstractEyes/geolip-svae/blob/main/geolip_svae/aleph_model.py

This model provides solid MSE, recon, cosine sim, and many other elements directly aligned to the SVD and H2 procrustes paradigm. Prelims are not smart, but the scaling principal is perfectly attuned to scale.

This invention will allow for direct internalized tokenization and utilization of compressed information, entirely internally within the models latent structure. This allows direct control capabilities baked into the model itself, which requires a few robustness tests to solidify the full structure. The first validation tests run clean, so it will work when correctly aligned.

In short, the first step towards the geometric encoder system that will work with all tested data types.

  • 3 replies
·
prithivMLmods 
posted an update 28 days ago
prithivMLmods 
posted an update about 1 month ago
view post
Post
6204
PiD — Pixel Diffusion Decoder Image Edit Upscale and Image Generation Upscale, an all-in-one demo, is now live on Spaces! Great improvements in realism-based image generation and editing are powered by FLUX.2-Klein, while image generation is paired with Z-Image, and upscaling is enabled by default!

🤗 Space: prithivMLmods/PiD-Image-Upscaler
🔗 Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

🤗 > To learn more, visit the app page or the respective model pages.
AbstractPhil 
posted an update about 1 month ago
view post
Post
820
The transformer prototype v2 is operational, which takes the behavior of the H2 battery and directly forces a projected rigid behavior into a multiscale structure. Turns roughly 57k params to around 90k params for the preliminary version, and with this behavior the model converges SEMI-CLOSE to the SVAE current spectrum in considerably less epochs. So stay tuned on that one, the transformer did converge. The behavior itself is validated and convergent in the H2 protocol spectrum.

The transformer operates with the "single" setting.

AbstractPhil/geolip-svae-transformer

I've implanted a rigid formula that allows this direct behavior from the H2 battery to superimpose onto adjacent structural boundaries, and with that built aleph and void into the system as well. These are guarantees.


As for the centrifuge concept. The optimization on the centrifuge was quite lackluster. The hardware doesn't support such behavior. You can access the current operating version of the centrifuge by utilizing "stacked" configuration. Four lenses was too much when running a quaternion bank to handle such complex interactions reasonably, so I will need to work something out in the future to get a full centrifuge system working.

Crusher is ready, transformer_v3.

You might be curious WHY these converge at such low raw MSE in the later stages. The reasoning is kind of difficult to explain, so I'll try to make it simple. The direction is very subtle in the later stages of training with AdamW, so the curves start to create much more accurate shifts towards the goals. This allows the model to rapidly converge after earlier heavier training. You can't simply train it low, it takes too long. This allows the model to KIND OF get everything NEAR where it's supposed to be, which allows the really small twitches of MSE to provide massive corrections without needing hard logits or more difficult to finetune features.
  • 9 replies
·
prithivMLmods 
posted an update about 1 month ago
view post
Post
5597
I've made 8 Spaces in the Qwen-Image-Edit series, and out of them, 5 Spaces reached “Space of the Week”! A few Spaces are still topping the list even after many months.

Cumulatively, the series has crossed 8.2 million+ ZeroGPU runs and nearly 4 million visitors overall.

Thanks for all the community support! 🤗❤️

🔗 Spaces: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
  • 4 replies
·
Tonic 
posted an update about 2 months ago
view post
Post
2966
🙋🏻‍♂️ Hey there folks ,

Turns out : if we predict 🌏 earth we can save a lot of time looking for interesting things and less time looking at things that we expect to see.

Sentinel-2 imagery 🛰️basically takes a long time to download towards earth. so our "near real time" systems are quite far from that in practical terms.

meanwhile , if we "predict" what we will see , based on what we do see , we can send down much less data in a timely way , and prioritize 📡earth-bound response .

I'm talking about illegal fishing , logging , mining or building in nature reserves , the more of that we predict early the more we're able to stop it on time.

At least that's the concept !

check out the blog : https://huggingface.co/blog/Tonic/save-patagonia-by-predicting-earth


- Collection: https://huggingface.co/collections/NuTonic/earth-observation-with-temporal-and-general-understanding
- Code: https://github.com/Josephrp/Nutonic
- Dataset: NuTonic/sat-vl-sft-training-ready-v1
- Model: NuTonic/lspace
- Training: NuTonic/lspace-trackio
- Evals: NuTonic/Patagonia_Eval
  • 2 replies
·
AbstractPhil 
posted an update about 2 months ago
view post
Post
2768
By trying to disprove the Omega H2 battery I have discovered;
* Each topology formed by the H2 battery is deviant, none have a uniformly shared substrate of behavior. They are each uniquely independent per training set all with perfect recon.
* Image recon can be tracked and mapped, yielding a consistently mapped and response 16.77m vocabulary potential. In the current spectrum testing at around 5 million unicode bytes.
* The model scale shows patch size is related to how much data you want the model to represent within the model itself, and this has yet to see a capacity to this day. The MSE recons and yields - and the more data fed, the more they yield.
* The scaling principle shows that the model indefinitely scales upward and each level of the model can be iteratively captured upward to form deviant and uniformly consistent repeatable pathways of implicit codewise response, not just arbitrary bitwise recall. Meaningful implicit learned utility.
* Image recon patch size should match the slice of image you want to represent, as it uses patch smoothing per patch internally from identity.
* byte trigrams are channel-agnostic, they do not require a channel count just a formula for recall at nGram recall 99.6% for byte-by-byte representations. With those comes an adjacently capable codebook.
* sentencepiece preliminary tests show validity and reconstruction just like the byte trigrams, using the new byte trigram this would be arbitrarily convenient to recon a codebook for the structure.
* binary trees learn a uniformly potent and powerful gating mechanism that required further exploration, each of them produces direct responsive independent capacity and the responses are controllable.
* ternary experiments show the models are directly responsive to -1, 0, +1 behavior, so the quantization is very much a valid potential.
* preliminary tests with the H2O1 series of batteries show the models are responding similar to natural universal elements in the universe itself
  • 11 replies
·
prithivMLmods 
posted an update about 2 months ago
view post
Post
5946
Multimodal-Edge Demo, a node-based inference canvas demo, is now live on Spaces. It features node-based Transformers for fast inference across 10+ edge-device multimodal models on the Hub, all within a single space. The series includes models from Qwen3.5, Qwen3-VL, Gemma 4, and the LFM 2.5 VL model series, with support for reasoning and grounding tasks.

🤗 Demo: prithivMLmods/Multimodal-Edge-Node
🔗 GitHub: https://github.com/PRITHIVSAKTHIUR/Multimodal-Edge-Node
✅ Multimodal Apps Collections: https://huggingface.co/collections/prithivMLmods/hall-of-multimodal-apps

🤗 > To learn more, visit the app page or the respective model pages.
Tonic 
posted an update 2 months ago
view post
Post
4348
🙋🏻‍♂️ Hey there folks,

since everyone liked my previous announcement post ( https://huggingface.co/posts/Tonic/338509028435394 ) so much , i'm back with more high quality proceedural datasets in the Geospacial domain for SFT training !

Check this one out :
NuTonic/sat-bbox-metadata-sft-v1

the goal is to be able to train vision models on multiple images for remote sensing analysis with one shot .

hope you like it ! 🚀
  • 2 replies
·
AbstractPhil 
posted an update 2 months ago
view post
Post
199
Today, I'll be determining the codebook capacity and utility potential for the larger batteries; Fresnel, Johanna, Grandmaster, Freckles, and Johanna-F variants, which should give a good indication of which models are capable of handling codebooks and which are more errant. The earlier all use SVD while the later do not. The differences are noted per and the behavior divergent.

I anticipate the D=16 will be more errant, and the final-state variants of those could very well be much more difficult or costly to inference as their axis bends are likely considerably harder to track. However, I'm confident that enough bounces will give the yield required so I'll set up some high-yield noise barrages to determine how much of them we can in fact extract from Johanna, and then set up similar barrages for images to map the internals of Fresnel and Grandmaster.

Grandmaster will be tricky, as it was an experimental Johanna-256 finetuned series meant to map sigma noised image inputs to recreate Fresnel behavioral output. Noised image goes in -> Fresnel-grade replication comes out in high res.

This allowed preliminary Dall-E Mini-esque VAE generation and will be explored further for the stereoscopic translation subsystem, to allow image generation in the unique format of diffusion that I was working out. I anticipate this system to be more than capable at making monstrosities, so I won't be posting TOO MANY prelims on this one, but the high-capacity potential of these noise makers are meaningfully powerful. Getting uniform codebooks in-place for these models will allow full transformer mapping downstream instead of just guess working the MSE piecemeal, which the earlier versions and variants were doing.

I'm straying from the CLS specifically for this series because CLS creates adjudicated pools of bias orbiting the INCORRECT orbiter some SVAE. The orbital target IS the soft-hand accumulated bias with the sphere-norm, so having a competitor isn't going to be a good option.
  • 7 replies
·
AbstractPhil 
posted an update 2 months ago
view post
Post
140
My recent study in a nutshell shows a few important elements and everything else is technical.

* There are most definitely invariant architectural geometric states that persist and can be taught.
* They are not coincidental and the process works effectively on multiple data types and processes, not just noise. Noise is just fast to test with.
* Systems like SVD, Eigh, Conv, and the like - HELP align those systems for larger structures to produce amplified stability, but are not required for smaller structures, and the tests show even attention gets in the way at the smallest.
* Batched arrays, stacks, queues, and so on - all improve performance depending on the task.
* An SVAE battery is resolution agnostic, meaning with simple processing and logic you can scan space and record meshes fairly optimally to record large amounts of inference data.
* Batteries when trained on one specific task often can be directly used for other tasks once a codebook is fitted with the necessary data. Meaning a battery trained on gaussian noise can be fed imagenet snippets and downstream the MSE rates from the 64 battery array can be consumed for statistics aggregation to a fair degree of accuracy without actually training the array on images themselves.
* The battery codebook is a pointwise rigid map within the battery and can be used for pairwise learning when using the H2, H2a, and H2b batteries.

So this is, the evolved state of the geometric vocabulary in some ways, and a completely new and unexpected systemic development in others. They stack, you can reuse them, so small you can swap them at runtime with no time loss, they align rapidly, and downstream tasks can consume their information.

There are many untested avenues that I need to make a full writeup for because quite frankly it's messy currently and Claude is only making it more messy instead of cleaner.
  • 2 replies
·
Tonic 
posted an update 2 months ago
view post
Post
3680
🙋🏻‍♂️ Hey there folks ,

I'm sharing huggingface's largest dataset of annotated statelite images today.

check it out here : NuTonic/sat-image-boundingbox-sft-full

I hope you like it , the idea is to be able to use this with small vision models 🚀
prithivMLmods 
posted an update 2 months ago
view post
Post
1933
Now, a collection of various compression schemes for Qwen3.6 and the abliterated version 1 of dense models is available on the Hub. Check it out via the links below. 👇

🔗 Qwen3.6-MoE: https://huggingface.co/collections/prithivMLmods/qwen36-35b-a3b-compressions
🔗 Qwen3.6-27B Compressions: https://huggingface.co/collections/prithivMLmods/qwen36-27b-compressions

🤗 > To learn more, visit the app page or the respective model pages.
AbstractPhil 
posted an update 2 months ago
view post
Post
88
Ever see a 1024x1024 3 channel a little over 1m param noise classifier? This is one. This is phase 1 of the omega experiments and it's successful on a very high-accuracy selectivity level through statistics aggregation and pooling via... a tiny MLP attached to the battery array.

SVAE don't care what resolution you use. They never had that concern, they are solvers that fly through solutions. Perfect for math solutions of many formats and many structures, exactly what I need for the next stages.
AbstractPhil/geolip-svae-h2-64

Currently the primary use case for tests is noise format identification. There are multiple experiments to go before a full nth classification system is ready, however as it stands the only stopping point is training batteries now. They mostly train within about 10 million samples of tiny data so they will fly out hundreds a day if I find purposes for them.

Also I trained too many gaussian-related batteries, so there's really only about 50-100 or so batteries useful in the 192 array I set up. There's really only 64 batteries trained total but there are multiple epochs involved.

Now that there is a 57k parameter variation that converges on 16 variants of random noise like Johanna and Freckles before, you ask this model questions differently. You check the MSE to train downstream models, so if your array isn't conclusively working it won't work just yet.
It's not perfect yet, but it's improving daily.

A bad battery in the mix can be replaced at runtime.
==============================================================================
PHASE J VERDICT
==============================================================================
Subset: 18 batteries, 1,029,870 params (vs 10.9M for full array)

Resolution       A (summary)   B (attn-pool)
256                   96.6%          93.1%
512                   95.4%          92.0%
1024                  95.4%          95.4%
  • 5 replies
·
prithivMLmods 
posted an update 2 months ago
view post
Post
4230
HY-World-2.0 — A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds is now available on Spaces, and it works both as native Gradio components and in Gradio server mode.

> HY-World-2.0-Demo: prithivMLmods/HY-World-2.0-Demo
> HY-World-2.0 [Server Mode]: prithivMLmods/HY-World-2.0-Demo
> Featuring 3D reconstruction and Gaussian splats with the Rerun viewer, along with camera poses, depth maps, and surface normals.
> In Server Mode, Gradio is served via FastAPI, with FastAPI remaining the top-level server.
> Model: tencent/HY-World-2.0
> GitHub: https://github.com/PRITHIVSAKTHIUR/HY-World-2.0-Demo

🤗To learn more, visit the app page or the respective model pages.