Bakwas v1 Alpha
Bakwas (Hindi for "Noise" or "Nonsense") is a high-performance, extractive prompt compression model designed to identify and strip redundant tokens from LLM prompts. By acting as a semantic filter, it reduces "Bloat Tax"โsaving costs and reducing KV-cache latency in downstream LLM workflows.
Model Details
Model Description
Bakwas v1 Alpha is a token classification model fine-tuned from distilbert-base-uncased. Unlike generative compressors that rewrite text, Bakwas is purely extractive. It predicts a binary mask for every token: Keep (Signal) or Discard (Bakwas). This ensures that the original prompt's structure remains familiar to the downstream LLM's training data.
- Developed by: Meet Sonawane
- Model type: Token Classification (Binary)
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model: distilbert-base-uncased
Model Sources
- Demo: https://huggingface.co/spaces/meet447/Bakwas-v1-alpha-demo
- Philosophy: "Context Engineering over Prompt Engineering."
Uses
Direct Use
The model is intended to sit as a middleware/proxy between the user and an LLM API.
- Use Case: Reducing the length of long technical documentation, system prompts, or RAG results.
- Target Audience: Developers building high-concurrency LLM applications where context window costs are a bottleneck.
Out-of-Scope Use
- Not intended for summarization where a creative rewrite is required.
- Not a replacement for safety filters.
- Avoid using on highly sensitive legal or medical data where every single "stop word" might carry legal weight.
Bias, Risks, and Limitations
- Clipping: At high thresholds (>0.7), the model may delete the subject of a sentence, leading to "telegram-style" fragments.
- Ghost Words: Occasionally, deleting sub-tokens can create non-standard word joinings (e.g., "compressing" -> "comess").
- Context Limit: While the base model has a 512-token limit, this model should be used with a Sliding Window Inference strategy for 20k+ token prompts.
Recommendations
Users should implement a "Structure Guard" (keeping punctuation) and a "Heuristic Heal" script to re-join fragments and ensure the output is readable for the downstream LLM.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("meet447/bakwas-v1-alpha")
model = AutoModelForTokenClassification.from_pretrained("meet447/bakwas-v1-alpha")
prompt = "Actually, I was just wondering if you could potentially help me with a small tiny Python script."
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
# 1 = Keep, 0 = Bakwas
predictions = torch.argmax(logits, dim=-1)
- Downloads last month
- 114
Model tree for meet447/bakwas-v1-alpha
Base model
distilbert/distilbert-base-uncased