Training data samples?

#2
by Downtown-Case - opened

I'm (sort of) aware you use a special training method, but are there samples of the exact prompt syntax you train on anywhere?

Specifically, I want to "match" my prompts to the format the model was trained on, so even a single example would be helpful.

I would hold off on this model, as I think I have finally fixed all the weird tokenizer bugs that were causing the end of lines to become mangled.

As for the actual format, then I explained it just now in this post:

https://huggingface.co/jukofyork/creative-writing-control-vectors-v3.0/discussions/14#689b4b1b2c8b75c3d4ad8090

I'm basically not trying to change the model's prompt format at all - just simply get it to choose better prose!

Have you had any luck with the new v3 version of this model?

I'm still a bit pissed off it's started to add extra spaces after period symbols after going to a lot of effort to stop it mangling the end of line characters lol.

Considering it used around 10x the data (~2.5B vs ~250M tokens), it should be significantly better even with this problem.

I will quantize it and see!

Sign up or log in to comment