frankentalkie
A merge derived from
talkie-lm/talkie-1930-13b-it.
using MergeMonster (https://github.com/Gryphe/MergeMonster)
The checkpoint keeps Talkie's tokenizer, embeddings, LM head, and custom Transformers
runtime, then relayers the decoder overlapping mid-stack
recipe. The retained model expands Talkie from 40 to 60 blocks using:
[0-11] + [12-23] + [14-25] + [16-27] + [28-39]
Repeated blocks use damped residual gains.
Usage
Install the runtime dependencies first:
pip install "transformers>=5.6.0" "accelerate>=1.0.0" "safetensors>=0.5.0" "tiktoken>=0.6.0"
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "Abstract4700/frankentalkie"
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo,
trust_remote_code=True,
dtype="bfloat16",
device_map="auto",
)
messages = [
{"role": "user", "content": "Write an essay predicting what life will be like in the year 1960."}
]
inputs = tok.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=300, do_sample=True, temperature=0.7)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
- Downloads last month
- -
Model tree for Abstract4700/frankentalkie-20b
Base model
talkie-lm/talkie-1930-13b-base Finetuned
talkie-lm/talkie-1930-13b-it