Naphula/Goetia-24B-v1 · Hands down – this is the best 24b model for daily RP atm.

Nov 21, 2025

•

edited Nov 21, 2025

Okay, hands down – this is the best 24b model for daily RP, no questions about it.

The model writes exactly as much as needed. The model writes a lot when something happens, and the model writes a little when the model deems it necessary.

Yes, SOMETIMES it acts as {{user}} when it could have been avoided, but a couple of edits to a message won't take long.

The model writes good RP and ERP.

Most importantly, the model follows the character's instructions, uses the lorebook, uses the scenario, creates other characters to advance the story, SHOWS INITIATIVE AND ADVANCES THE PLOT. The model is original enough that I don't get bored.

THE MESSAGE FORMAT IS CONSISTENT AND PLEASANT!
I don't need an English degree to reread a single five-line sentence over and over again to understand how {{char}} reflects, experiences, and describes past emotions, "and now they've supposedly changed, but still, something inside is kindling in the heart of a cold person who's never felt this way before because he's had a hard life, and now a small hope appears in the soul of this character who's never experienced anything like this, and now blah-blah-blah." And that's in one sentence.

I've seen many RP models.

Some ignore the script;

or are passive as a rock;

or are bland SFW, waiting for YOU to tell them to do something (even though the prompt and the script force them to act);

or are identical, just churning out the same model 10 times (probably with the goal of getting into the UGI leaderboard);

or - they break the formatting, making further RP impossible unless you constantly edit your messages. I don't want to delete "**" after a couple of words every time, simply because even the token ban in SillyTavern doesn't work against this model.

I can't even point out any particular downsides to this model.

Perhaps RARE hallucinations, when {{char}} switches between 'me' and 'you' in a reply, when {{char}} meant to say about themselves but accidentally said 'for' {{user}}. Perhaps it's a quirk of the mistral model.

I use 5_M GUFF.

'...But his work was interrupted by a message on his phone.'

Celia: "Dear {{user}},

It's unfortunate you were too rude to enjoy the company of Miss ******. However, your employment at OASIS Corporation has come under review. I believe Rio will understand when she learns you are a rude and disrespectful person towards important partners like me. Enjoy your weekend.

Yours,
Miss ****** Celia"

Naphula

Owner Nov 21, 2025

Very nice, did you compare to 1.1 at all to see if it was better/worse at RP? If 1.0 is superior then maybe I should switch back to hierarchical slerps for v1.2 instead of karcher.

PavPav

Nov 22, 2025

Very nice, did you compare to 1.1 at all to see if it was better/worse at RP? If 1.0 is superior then maybe I should switch back to hierarchical slerps for v1.2 instead of karcher.

(will test in tomorrow. Will write back after few days of chatting)

PavPav

Nov 24, 2025

Welp.. I'm afraid I don't like v1.1 so far.

The features of 1.1 are that the model is slightly more creative and freer. It writes more, describing the actions and surroundings of the characters but...

It immediately catches my eye:
During a dialogue where short messages are exchanged, he immediately writes for {{user}}.

And also, when playing with bot's card that has several characters in it, it confuses clothes, scenery, and events with each other. It makes the roleplay a mess after some time. (16k context)

I've seen some model authors say, "Hey, the model is very creative on its own, use the 0.4 temperature, otherwise it will be a mess." I don't like this approach for testing. I use '1.0' as the temperature for the test.

Naphula

Owner Nov 24, 2025

•

edited Nov 24, 2025

Thanks for the feedback. I usually don't have time to test high temp creativity (for benching I use temp 0.01), so these results are valuable.

If 1.0 has better context retention, but 1.1 writes more although is prone to impersonating user, it could indicate that one or more of the component models added may have some incompatiblity.

1.1 is karcher:

  - model: dphn/Dolphin-Mistral-24B-Venice-Edition
  - model: FlareRebellion/WeirdCompound-v1.7-24b
  - model: Naphula/BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
  - model: Naphula/Evilmind-24B-v1
  - model: OddTheGreat/Rotor_24B_V.1
  - model: TheDrummer/Cydonia-24B-v4.2.0
  - model: TheDrummer/Magidonia-24B-v4.2.0
  - model: TheDrummer/Rivermind-24B-v1
  - model: trashpanda-org/MS3.2-24B-Mullein-v2 
  - model: zerofata/MS3.2-PaintedFantasy-v2-24B

my theory: finding a 'geometric mean' of several finetunes, SLERPs, model_stock, and ties could be causing some problems and breakdown of longer context. or maybe just dulling the model in some ways too much.

i found that karcher produced more output on average than slerp, but was sometimes more censored. however, new abliteration tools change this (goetia might have an ablated version next)

1.0 is SLERP:

Checkpoint A (SLERP):
    Darkhn/M3.2-24B-Animus-v7.1
    Fentible/BlackDolpin-24B [TroyDoesAI/BlackSheep-24B] [dphn/Dolphin-Mistral-24B-Venice-Edition]
Checkpoint B (SLERP):
    OddTheGreat/Circuitry_24B_V.2 [Delta-Vector/Rei-24B-KTO] [TheDrummer/Cydonia-24B-v4.1] [zerofata/MS3.2-PaintedFantasy-v2-24B]
    FlareRebellion/WeirdCompound-v1.6-24b [aixonlab/Eurydice-24b-v3.5] [TheDrummer/Cydonia-24B-v4.1] [PocketDoc/Dans-PersonalityEngine-V1.3.0-24b] [CrucibleLab/M3.2-24B-Loki-V1.3] [zerofata/MS3.2-PaintedFantasy-v2-24B] [Doctor-Shotgun/MS3.2-24B-Magnum-Diamond] [anthracite-core/Mistral-Small-3.2-24B-Instruct-2506-Text-Only]

I don't think switching from Circuitry to Rotor would cause this. It could be a Karcher quirk, or it could be one of the models like Rivermind / Mullein causing this, or the removal of Animus. Idk

But for 1.2 I'll probably reduce the number of models. I plan on making version 1.2 after the Psychosis 14B tests are finished. They are going quite well, in fact, Goetia v1.2 may have a special surprise, using a different merge algorithm altogether.

PavPav

Nov 24, 2025

model: dphn/Dolphin-Mistral-24B-Venice-Edition
model: FlareRebellion/WeirdCompound-v1.7-24b
model: Naphula/BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
model: Naphula/Evilmind-24B-v1
model: OddTheGreat/Rotor_24B_V.1
model: TheDrummer/Cydonia-24B-v4.2.0
model: TheDrummer/Magidonia-24B-v4.2.0
model: TheDrummer/Rivermind-24B-v1
model: trashpanda-org/MS3.2-24B-Mullein-v2
model: zerofata/MS3.2-PaintedFantasy-v2-24B

Cant say much about this list, but -

Rotor is a good model for rp (better than Circuitry), i have used it before finding your Goatia. Rotor's flaw - it's good at rp and mocing scenes foward, but it barely uses scenario section.

I have used Evilmind of yours, but it was quite messy from a start, so without futher testing it was a 'no'.

Cydonia (and Drummer's models) are tend to talk for {{user}} it's for sure. That's what i clearly remember. But hey, some people like to be world builders or masterminds, so we can't say this is a bad thing when {{char}} talks sometimes for {{user}}. Different tastes.