Training data samples?

by Downtown-Case - opened Aug 11, 2025

Aug 11, 2025

I'm (sort of) aware you use a special training method, but are there samples of the exact prompt syntax you train on anywhere?

Specifically, I want to "match" my prompts to the format the model was trained on, so even a single example would be helpful.

jukofyork

Owner Aug 12, 2025

I would hold off on this model, as I think I have finally fixed all the weird tokenizer bugs that were causing the end of lines to become mangled.

As for the actual format, then I explained it just now in this post:

https://huggingface.co/jukofyork/creative-writing-control-vectors-v3.0/discussions/14#689b4b1b2c8b75c3d4ad8090

I'm basically not trying to change the model's prompt format at all - just simply get it to choose better prose!

jukofyork

Owner Aug 29, 2025

•

edited Aug 29, 2025

Have you had any luck with the new v3 version of this model?

I'm still a bit pissed off it's started to add extra spaces after period symbols after going to a lot of effort to stop it mangling the end of line characters lol.

Considering it used around 10x the data (~2.5B vs ~250M tokens), it should be significantly better even with this problem.

Downtown-Case

Aug 29, 2025

I will quantize it and see!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment