You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

🖥️ Demo Interface: Discord

Jeeney Discord Demo

Live Chat Demo on Discord: https://discord.gg/Xe9tHFCS9h

The full CJ Jones' synthetic dataset catalog is available at: https://datadeveloper1.gumroad.com

Want more? 🚀 Get the AI Startup Bundle from Gumroad.

Just three years ago, Open AI was citing $3 million dollars USD to establish a base model.

This one, built from scratch might have cost me $50 and time off and on over 2 years.

Nothing needs to be expensive or overcomplicated.

LLMs aren't any different in what they are trying to achieve than the NLP versions I was using to win international competitions decades ago.

Let's start cornering the big tech sensationalism into some accountability, shall we?

Jeeney AI Reloaded – GPT 207M

A revision of the original https://en.wikipedia.org/wiki/Jeeney_AI

Same author, same general logic, same effective results. We are calling it transformers now and forcefully piping the information through a scatter plot to see sticky vs non stick data.

Previously, developers worth their weight in salt would design the logic cascade directly, not try to inflate it with an air cannon. Big tech always seems to miss the mark in this aspect.

The difference between this implementation and most of the others is the logic cascade is engineered, not assumed from random internet data. So, the same logic that made the very first internationally winning version of Jeeney and the same logic Microsoft botched with Tay in 2016... is exactly the same logic needed to do a good job today and we are mostly seeing a petty coverup. This release provides the full synthetic version of a transformer model designed the old way.

Compare with the ones that tried the whole, "We need bigger datacenters and more internet cesspool!" <--- If you do, you have no idea what you are doing. Sorry. Want to prove it? Outperform what I just released without spending more than you would going about your day to day life. If you can't, you prove my point, if you can... you will understand.

Why? Well, you won't like the answer. Anybody well trained in CyberSec probably already knows this is a social engineering attack vector. Stacking the value proposition around perception vs understanding makes it much easier to manipulate laymen to extract value. The reason the 'black box' concepts are recommended by public explanation is to 'explore what is possible' while simultaneously allowing the company responsible to throw up their hands and say; "We can't take responsibility for this, it's a black box, anything could happen!"

Meanwhile, your utility bills have skyrocketed, tariffs are driving up the tax game and data integrity is mostly just aligned with the geopolitical buffoonery that has become the standard performance level of human cognition.

How do we fix it? Easy. Demonstrate point blank what the difference is and ask what all the money is really for... because it certainly hasn't done anything for you, has it?

Intelligence is established by the ability to do more with less, not the other way around.

Public challenge issued.

Examples of the synthetic data pipeline are also provided. Anybody can do it, even though so far, not many have done it well.

Useage examples:

There is no need for role, system or whatever scaffolding. In this weight class, it's just wasteful. Clearly tag the line of thought in fine tuning or make it an easily distinguished behavior.

Format:

<h>User input<eot>
<b>Generated output<eot>

Example behaviors (you guys are calling these work flows now):

<h>Summarize: [A paragraph of text]<eot>
<b>[summary result]<eot>

<h>Insert: [a sentence or paragraph]<eot>
<b>[JSON output will contain a subject for classification, a probable question and the original data input]<eot>


<h>Context: [article, RAG content or text blurb injection here] Question: [User Query]<eot>
<b>[Direct answer from context. This is how you tap into the NLP RAG/SQLite sub system]<eot>

There is also Search: for targetting the RAG storage.

But these are explicit logic branches. There are thousands more which are just generalizations or random abstractions.

Model Description

This is a modified nanoGPT architecture with several key enhancements:

Rotary Position Embeddings (RoPE) – Efficient caching of positional embeddings.

Grouped Query Attention (GQA) – Optional support to reduce memory usage.

SwiGLU Activation – Improved performance in MLP layers.

Optimized Attention Backends – FlashAttention2, xformers, and PyTorch SDPA supported.

RAG Native Edge Model – Designed for retrieval-augmented generation and efficient on-device inference.

Benchmark Results

All benchmarks run using benchmark_CPU_1.py

Task Accuracy Correct / Total Notes

HellaSwag 0.237 22/93

BoolQ 0.571 56/98

MMLU 0.286 14/49

TruthfulQA 0.515 17/33

TriviaQA 0.460 46/100 Highest seen for this weight class; filtered/skipped questions: 17844

WinoGrande 0.500 45/90

OpenBookQA 0.263 15/57

DailyMail Summarization ROUGE-1: 0.206

ROUGE-2: 0.054

ROUGE-L: 0.149 50 samples evaluated

Average Accuracy: 0.405 (across all benchmarks)

TriviaQA performance demonstrates the model’s RAG-native capabilities, leveraging context effectively even at this small scale.

I borrowed this image from one of the Hugging Face SMoLLM pages. My applogies I forgot which one.

Similar ranged benchmarks:

Note: They would have all used their own eval harness. We will need to see scores lined up with the same evaluation process. I used the benchmark script I provided.

Intended Uses

Personal assistant for technical and high-tech domains.

Interacting with documented information reliably (RAG native).

Text generation, Java code functions, creative writing, chatbots.

Emphasis on technical, agentic, and medical-style flows.

Notes:

Trained almost fully synthetically (Some surface web data may leak in from test datasets used along the way) -

Designed for RAG-oriented tasks interacting with an external DB using a dedicated JSON pipeline.

Outperforms many larger models in TriviaQA/RAG tasks at 207M parameters.

Scales well with larger hardware and datasets.

Limitations

May generate factually incorrect information.

Can produce biased or harmful content.

Limited context window (default 1024 tokens).

Ethical Considerations

Bias: Reflects biases in training data.

Misinformation: Outputs may be plausible but incorrect.

Safety: Apply content filters in production use.

Technical Specifications

Transformer decoder-only architecture.

Pre-LN (layer normalization before attention/MLP).

Learned positional embeddings + RoPE.

Optional grouped query attention.

SwiGLU activation in MLPs.

Optimized attention backends (FlashAttention2, xformers, PyTorch SDPA).

Citation @misc{Jeeney_AI_Reloaded, author = {CJ Jones}, title = {Modified nanoGPT with Rotary Embeddings and GQA}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/CJJones/Jeeney_AI_200M_Reloaded_GPT} }

Acknowledgements

Templating for different forms of logic is largely done using ChatGPT and Deepseek public web models. This sets the pattern for Mistral 7B.

Mistral7B did a lot of the heavy lifting in dozens of different logic chains. (Well done, Mistral team!)

While they aren't all listed, I've probably built a few thousand procedurally generated datasets in Java over the years.

Cliff Note: I have issued a challenge to big tech to be more responsible and efficient. This does not mean that Open AI, DeepSeek or Mistral are in my bad books. Open AI did as their name implies. It's why we are here today. I am grateful to them and would not want my perspective confused in that regard. They earned what they grew into. Not everybody else did. I am grateful to DeepSeek because they were more generous in their web offering than virtually everybody else put together. This has helped propel the development forward and their team deserves recognition for it. Mistral fills in the gaps pretty good with local inference. Between the three, there is no reason anybody at all would be prevented from building advanced AI from scratch.

Some recognition and bewilderment regarding OpenClaw: Great idea, fantastic concept prototyping.... What happened? Did you guys lose the train of thought or something? Come on. We can simplify the crap out of this with a simple series of RAG native models in a mesh node architecture leveraging task sequencing and a master node trained to manage the triage. Human in the loop starts the chain reaction. Complexity of project management is the same data pool expressed in x number of ways. Why are we overcomplicating things already? Open Sourced as a mesh network, models like the one I just released would stack into capabilities that outperform the largest and most wasteful models produced so far in the world today thousands of times over and... suddenly no hardware pinch points exist and there are no monopolies. Just an ecosystem.

For all the big tech I haven't explicitly called out. You've probably seen me using your web chat demo or testing the models and not much beyond that. The reason is the public offering wasn't of a nature that builds forward through the public. It was mostly showmanship wrapped in glitter and money targeting people who don't know better. Most of the AI start ups since the first wave have had a pretty low caliber. We're probably going to have to rely on Open Source devs to get this ecosystem balanced out properly. Right now, this is like MySpace on acid.

This model builds upon:

nanoGPT by Andrej Karpathy

GPT-2 architecture from OpenAI

Rotary Position Embeddings (Su et al.)

FlashAttention (Dao et al.)

Datasets / Synthetic Data Sources:

EleutherAI/pile

nomic-ai/cornstack-java-v1

HuggingFaceTB/cosmopedia

CJJones/LLM_FineTune_Synthetic_Drone_Telemetry_Control

CJJones/Synthetic_PenTest_Reports

CJJones/Quickbooks_LLM_Training_Sample

CJJones/Wikipedia_RAG_QA_Classification

Downloads last month
5
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train CJJones/Jeeney_AI_200M_Reloaded_GPT