Core Problem
The VAE is TOO accurate. Introducing any invariance is a serious balancing act. Even with multiple procrustes models available and a perfect sample-friendly patchwork system, the problems will always present themselves as unilaterally TOO GOOD. Meaning, I'm sure others had this problem before, and that's likely why we have multiple different layers of VAE rather than simply a perfect decomp recomp like this one is currently.
This isn't a bad thing by any means. I simply need to teach the downstream task correctly with the SVD formula and provide the correct magnitude pred.
Prototype 3; the grandmaster
Looks like... diffusion is different with these models. Much simpler. Never needed a diffuser in the traditional sense, just a methodology for storing sampled modifications.
An embedding array. These really aren't tough at all, they work nearly perfectly with the CM system. It will need a few more prototypes to get it correct though.
Once correct, it will theoretically zero-shot diffuse full generated images at any size with any combination of images or labels or whatever arbitrarily with nearly instant training.
It's not... diffusion though. It's geo-relational arbitration. A very different MSE process than diffusion, and much more accurate, so introducing incorrectness is the goal of the grandmaster. It introduces imperfection to allow introduction and infusion of additional utilities and details.
This essentially produces a form of near-perfect recall, and now I have to introduce imperfection to synthesize combination data with it. This is quite similar to early stages of dall-e mini, so please bare with me until I get the prototype stages worked out. It's going to be rocky at first.
There is a LOT of diffusion processes that can be done, so bare with it - it'll be ready before you know it.
The grandmaster SVAE should handle all of the problem and pain points. Adding middlepoint bottlenecking is all you need to do. Everything else is just details.
Prototype 2
Single images instead of tiled. The tiles were never tested, the models were trained on singular forms and can piecemeal upward, but I never tested tiled or downward.
Ohhh... Traditional diffusion won't work. I have to diffuse omega space not pixel space.
Oh yeah that makes sense. This should be done nearly instantly th- oh wait it's already done. I just need to create a little middle-state buckler and hook them all together. That should diffuse stepwise.
This one's not going to be so easy.
Prototype 1
This contains Johanna-Omega-128 and Fresnel-128
OOP classic sign flip, HOLD PLEASE.
Fresnel sees the perfectly clean images.
Johanna sees noise applied images.
The stereo-echo denoiser is tasked with learning how to remove this noise.
The outcome is predicting how to replicate Fresnel's clean image.
Labels regulate it all for replication.
It is most definitely working. This is a form of stereoscopic magnitude prediction, not diffusion in the traditional sense.
Well I thought so. Tiling cifar images like this was a bad idea. We'll move to upscaled 64x64 tinyimagenet. This should be plenty.

