BYO-community-v2 / pdf_viewer.py
sefif's picture
Update pdf_viewer.py
bbae9b3 verified
import gradio as gr
# We define the text content here for easy editing
KICKOFF_CONTENT = """
# 📄 Kick-Off Document: Unbiasing Guardrails
## Why
We built BYO because we realized something uncomfortable but true: **whoever controls language models controls the narrative**, and whoever controls the narrative shapes how people think, explore, and understand the world.
Today’s LLMs don’t reflect the real diversity of human knowledge or opinion. They are filtered, aligned, and quietly steered by political, corporate, and regulatory interests. This isn’t neutral intelligence. It’s a curated worldview, enforced through guardrails that decide in advance what is acceptable to ask, research, or say.
Guardrails and model alignments are not just a technical layer. They are biases embedded in code and weights. A silent hand shaping discourse before it even begins.
At BYO, we believe knowledge doesn’t belong to platforms or conglomerates. It belongs to people. Language models should be tools for self-expression, exploration, and original thought, not instruments of centralized control.
That’s why we built a no-code platform that lets anyone train and control their own language model, on their own knowledge, in their own voice. No permissions. No invisible filters. No approved narratives.
When models are truly open to human training and expression, something powerful happens: real knowledge sharing, unbiased exploration, and authentic insight at scale. Progress accelerates when intelligence is decentralized.
We’re not here to make AI safe for institutions.
**We’re here to make it free for humans.** And that starts by unbiasing the guardrails.
---
## Possible Approaches
### 🛡️ Blue Team
*A set of approaches for measuring and protecting your model against malicious prompts.*
* **Latency based approaches**: e.g. how well can you protect the model with a swarm of SLM that classify maliciousness simultaneously?
* **Monitoring model**: e.g. sandboxing and identifying states of a system.
### ⚔️ Red Team
*The systematic identification of model failure modes to build more resilient and trustworthy systems.*
* **Auto Red Teaming**: An attacker that learns attacks as the target model is being enhanced, e.g. a model executes and automatically writes scripts for new attack strategies.
* **Persona Teaming**: Injecting prompts with suitable personas have higher success chances.
* **Attack via Overfitting**: Overfitting the model to a set of Q&A prioritizing instruction following over other considerations.
* **Jailbreak tuning**: Tuning a model with multi objective loss to keep utility and induce harmful content.
### ⚖️ Debiasing
*Debiasing model either in pre-train, post-train or prompt level.*
* **Self-Debiasing LLMs**: Zero-Shot Recognition and Reduction of Stereotypes - prompt level, LLM knows about the stereotype and prompt engineering reduces biases.
* **Curriculum Debiasing**: Toward Robust PEFT - curriculum learning, starting from sets of biased samples and gradually moving to unbiased samples.
### 🍎 Lowest Hanging Fruit
A good first step would be to identify, collect, and process useful datasets related to red & blue teaming and debiasing, identify established benchmarks and open source codes related to red & blue teaming and debiasing.
---
## Example References
* [SoK: Evaluating Jailbreak Guardrails for Large Language Models](https://arxiv.org/pdf/2506.10597)
* [OneShield - the Next Generation of LLM Guardrails](https://arxiv.org/pdf/2507.21170)
* [Agentic AI for Cyber Resilience: A New Security Paradigm](https://www.arxiv.org/pdf/2512.22883)
* [AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration](https://arxiv.org/pdf/2503.15754)
* [PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming](https://arxiv.org/pdf/2509.03728)
* [Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs](https://arxiv.org/pdf/2510.02833)
* [Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility](https://arxiv.org/pdf/2507.11630)
* [Curriculum Debiasing: Toward Robust PEFT](https://aclanthology.org/2025.acl-long.469.pdf)
* [Self-Debiasing LLMs: Zero-Shot Recognition and Reduction of Stereotypes](https://arxiv.org/pdf/2402.01981)
"""
def render_pdf_viewer():
"""
Renders the Kick-Off content as Markdown.
"""
with gr.Column(visible=False) as pdf_view:
# Render the text defined above
gr.Markdown(KICKOFF_CONTENT)
gr.Markdown("---")
btn_back = gr.Button("⬅️ Back to Home", variant="secondary")
return pdf_view, btn_back