Bakwas v1 Alpha

Bakwas (Hindi for "Noise" or "Nonsense") is a high-performance, extractive prompt compression model designed to identify and strip redundant tokens from LLM prompts. By acting as a semantic filter, it reduces "Bloat Tax"—saving costs and reducing KV-cache latency in downstream LLM workflows.

Model Details

Model Description

Bakwas v1 Alpha is a token classification model fine-tuned from distilbert-base-uncased. Unlike generative compressors that rewrite text, Bakwas is purely extractive. It predicts a binary mask for every token: Keep (Signal) or Discard (Bakwas). This ensures that the original prompt's structure remains familiar to the downstream LLM's training data.

Developed by: Meet Sonawane
Model type: Token Classification (Binary)
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model: distilbert-base-uncased

Model Sources

Demo: https://huggingface.co/spaces/meet447/Bakwas-v1-alpha-demo
Philosophy: "Context Engineering over Prompt Engineering."

Uses

Direct Use

The model is intended to sit as a middleware/proxy between the user and an LLM API.

Use Case: Reducing the length of long technical documentation, system prompts, or RAG results.
Target Audience: Developers building high-concurrency LLM applications where context window costs are a bottleneck.

Out-of-Scope Use

Not intended for summarization where a creative rewrite is required.
Not a replacement for safety filters.
Avoid using on highly sensitive legal or medical data where every single "stop word" might carry legal weight.

Bias, Risks, and Limitations

Clipping: At high thresholds (>0.7), the model may delete the subject of a sentence, leading to "telegram-style" fragments.
Ghost Words: Occasionally, deleting sub-tokens can create non-standard word joinings (e.g., "compressing" -> "comess").
Context Limit: While the base model has a 512-token limit, this model should be used with a Sliding Window Inference strategy for 20k+ token prompts.

Recommendations

Users should implement a "Structure Guard" (keeping punctuation) and a "Heuristic Heal" script to re-join fragments and ensure the output is readable for the downstream LLM.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("meet447/bakwas-v1-alpha")
model = AutoModelForTokenClassification.from_pretrained("meet447/bakwas-v1-alpha")

prompt = "Actually, I was just wondering if you could potentially help me with a small tiny Python script."
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# 1 = Keep, 0 = Bakwas
predictions = torch.argmax(logits, dim=-1)

Downloads last month: 114

Safetensors

Model size

66.4M params

Tensor type

F32

Model tree for meet447/bakwas-v1-alpha

Base model

distilbert/distilbert-base-uncased

Finetuned

(11482)

this model

meet447
/

bakwas-v1-alpha