Bakwas v1 Alpha

Bakwas (Hindi for "Noise" or "Nonsense") is a high-performance, extractive prompt compression model designed to identify and strip redundant tokens from LLM prompts. By acting as a semantic filter, it reduces "Bloat Tax"โ€”saving costs and reducing KV-cache latency in downstream LLM workflows.

Model Details

Model Description

Bakwas v1 Alpha is a token classification model fine-tuned from distilbert-base-uncased. Unlike generative compressors that rewrite text, Bakwas is purely extractive. It predicts a binary mask for every token: Keep (Signal) or Discard (Bakwas). This ensures that the original prompt's structure remains familiar to the downstream LLM's training data.

  • Developed by: Meet Sonawane
  • Model type: Token Classification (Binary)
  • Language(s) (NLP): English
  • License: Apache 2.0
  • Finetuned from model: distilbert-base-uncased

Model Sources

Uses

Direct Use

The model is intended to sit as a middleware/proxy between the user and an LLM API.

  • Use Case: Reducing the length of long technical documentation, system prompts, or RAG results.
  • Target Audience: Developers building high-concurrency LLM applications where context window costs are a bottleneck.

Out-of-Scope Use

  • Not intended for summarization where a creative rewrite is required.
  • Not a replacement for safety filters.
  • Avoid using on highly sensitive legal or medical data where every single "stop word" might carry legal weight.

Bias, Risks, and Limitations

  • Clipping: At high thresholds (>0.7), the model may delete the subject of a sentence, leading to "telegram-style" fragments.
  • Ghost Words: Occasionally, deleting sub-tokens can create non-standard word joinings (e.g., "compressing" -> "comess").
  • Context Limit: While the base model has a 512-token limit, this model should be used with a Sliding Window Inference strategy for 20k+ token prompts.

Recommendations

Users should implement a "Structure Guard" (keeping punctuation) and a "Heuristic Heal" script to re-join fragments and ensure the output is readable for the downstream LLM.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("meet447/bakwas-v1-alpha")
model = AutoModelForTokenClassification.from_pretrained("meet447/bakwas-v1-alpha")

prompt = "Actually, I was just wondering if you could potentially help me with a small tiny Python script."
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# 1 = Keep, 0 = Bakwas
predictions = torch.argmax(logits, dim=-1)
Downloads last month
114
Safetensors
Model size
66.4M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for meet447/bakwas-v1-alpha

Finetuned
(11482)
this model

Spaces using meet447/bakwas-v1-alpha 2