commited on
Commit
16b18d7
·
verified ·
1 Parent(s): 193c14a

Upload 4 files

Browse files
Files changed (4) hide show
  1. README.md +50 -20
  2. app.py +20 -0
  3. cipher_classifier.pkl +3 -0
  4. requirements.txt +3 -3
README.md CHANGED
@@ -1,20 +1,50 @@
1
- ---
2
- title: Cipher Classifier
3
- emoji: 🚀
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
- pinned: false
11
- short_description: Streamlit template space
12
- license: mit
13
- ---
14
-
15
- # Welcome to Streamlit!
16
-
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
18
-
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🔐 Encrypted Text Classifier – 20 Newsgroups Cipher Challenge
2
+
3
+ This project is built for the [Kaggle Ciphertext Challenge](https://www.kaggle.com/competitions/20-newsgroups-ciphertext-challenge), where the goal is to classify encrypted text documents into 20 different newsgroup categories.
4
+
5
+ 🎯 Even without decrypting the text, we trained a character-level machine learning model that achieves over **63% accuracy**.
6
+
7
+ ---
8
+
9
+ ## 📂 Project Structure
10
+ cipher-classifier/
11
+ ├── app.py # Streamlit app
12
+ ├── cipher_classifier.pkl # Pickled model + vectorizer
13
+ ├── train.csv # Kaggle training data
14
+ ├── requirements.txt # Libraries for deployment
15
+ └── README.md
16
+
17
+
18
+ ---
19
+
20
+ ## 🧠 Model Overview
21
+
22
+ - **Input:** Ciphertext strings (unreadable encrypted text)
23
+ - **Vectorization:** `CountVectorizer` with char-level n-grams (1 to 3)
24
+ - **Model:** Logistic Regression (sklearn)
25
+ - **Accuracy:** ~63% (without decryption)
26
+
27
+ ---
28
+
29
+
30
+ Example Output
31
+ Input (Ciphertext) Predicted Label
32
+ ['W')(7x1zay7Hb3... 15
33
+ Tx4a8M\HNsyp;HM... 8
34
+
35
+
36
+
37
+ 📦 Deployment
38
+ This app is designed to run on:
39
+
40
+ 🟢 Hugging Face Spaces
41
+
42
+ 🟢 Streamlit Cloud
43
+
44
+ 🔵 GitHub
45
+
46
+
47
+ 📌 Kaggle Link
48
+ You can download the dataset from the official competition:
49
+ 👉 Kaggle – 20 Newsgroups Ciphertext Challenge
50
+
app.py ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pickle
3
+
4
+ # Başlık
5
+ st.title("🧠 Encrypted Text Classifier (20 Newsgroups)")
6
+
7
+ # Modeli yükle
8
+ with open("cipher_classifier.pkl", "rb") as f:
9
+ vectorizer, model = pickle.load(f)
10
+
11
+ # Girdi alanı
12
+ text = st.text_area("🔐 Enter encrypted (ciphertext) input:", height=200)
13
+
14
+ if st.button("🧪 Predict"):
15
+ if text.strip() == "":
16
+ st.warning("Lütfen şifreli bir metin girin.")
17
+ else:
18
+ X_input = vectorizer.transform([text])
19
+ prediction = model.predict(X_input)[0]
20
+ st.success(f"📂 Predicted category: **{prediction}**")
cipher_classifier.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f7d71b1fcda5760dbc2df3c4bb7de6ba87832e480b6a0ae337b9bd9bebc8aaf5
3
+ size 185929
requirements.txt CHANGED
@@ -1,3 +1,3 @@
1
- altair
2
- pandas
3
- streamlit
 
1
+ streamlit
2
+ scikit-learn
3
+ pandas