File size: 1,429 Bytes
809d6e8
 
 
 
 
 
cdb2133
809d6e8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cdb2133
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
title: Encrypted Text Classifier
emoji: 🔐
colorFrom: gray
colorTo: blue
sdk: streamlit
sdk_version: 1.45.1
app_file: app.py
pinned: false
---


# 🔐 Encrypted Text Classifier – 20 Newsgroups Cipher Challenge

This project is built for the [Kaggle Ciphertext Challenge](https://www.kaggle.com/competitions/20-newsgroups-ciphertext-challenge), where the goal is to classify encrypted text documents into 20 different newsgroup categories.

🎯 Even without decrypting the text, we trained a character-level machine learning model that achieves over **63% accuracy**.

---

## 📂 Project Structure
cipher-classifier/
├── app.py # Streamlit app
├── cipher_classifier.pkl # Pickled model + vectorizer
├── train.csv # Kaggle training data
├── requirements.txt # Libraries for deployment
└── README.md


---

## 🧠 Model Overview

- **Input:** Ciphertext strings (unreadable encrypted text)
- **Vectorization:** `CountVectorizer` with char-level n-grams (1 to 3)
- **Model:** Logistic Regression (sklearn)
- **Accuracy:** ~63% (without decryption)

---


Example Output
Input (Ciphertext)	Predicted Label
['W')(7x1zay7Hb3...	15
Tx4a8M\HNsyp;HM...	8



📦 Deployment
This app is designed to run on:

🟢 Hugging Face Spaces

🟢 Streamlit Cloud

🔵 GitHub


📌 Kaggle Link
You can download the dataset from the official competition:
👉 Kaggle – 20 Newsgroups Ciphertext Challenge