Spaces:

handecarkci
/

feedback-ell-streamlit

Sleeping

App Files Files Community

hç commited on Jun 1, 2025

Commit

4db119a

verified ·

1 Parent(s): 722e410

Upload 6 files

Browse files

Files changed (6) hide show

README.md +67 -20
app.py +33 -0
project_description.txt +69 -0
requirements.txt +5 -3
ridge_model.pkl +3 -0
tfidf_vectorizer.pkl +3 -0

README.md CHANGED Viewed

@@ -1,20 +1,67 @@
----
-title: Feedback Ell Streamlit
-emoji: 🚀
-colorFrom: red
-colorTo: red
-sdk: docker
-app_port: 8501
-tags:
-- streamlit
-pinned: false
-short_description: Streamlit template space
-license: mit
----
-# Welcome to Streamlit!
-Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).

+# 📝 Feedback Prize - English Language Learning (Basitleştirilmiş Versiyon)
+Bu proje, Kaggle'daki "Feedback Prize - English Language Learning" yarışmasına basitleştirilmiş bir çözüm sunar. Öğrenci kompozisyonlarından 6 dil becerisi tahmin edilir:
+- Cohesion
+- Syntax
+- Vocabulary
+- Phraseology
+- Grammar
+- Conventions
+---
+## 📁 Kullanılan Veri Seti
+- `train.csv`: Öğrenci yazıları ve puanlar
+- `test.csv`: Tahmin yapılacak yazılar
+- `sample_submission.csv`: Örnek çıktı formatı
+Veriler [Kaggle yarışma sayfasından](https://www.kaggle.com/competitions/feedback-prize-english-language-learning/data) indirilebilir.
+---
+## 🔧 Kullanılan Yöntemler
+- **TF-IDF** ile metin vektörleştirme
+- **Ridge Regression** ile çoklu puan tahmini
+- `MultiOutputRegressor` ile 6 hedefin aynı anda öğrenilmesi
+- Basit ve etkili yaklaşım (RMSE ≈ 0.56)
+---
+## 💻 Streamlit Uygulaması
+```bash
+streamlit run app.py
+📦 Kurulum
+pip install -r requirements.txt
+🧠 Model ve Vektörleştirici
+ridge_model.pkl: Eğitilmiş regresyon modeli
+tfidf_vectorizer.pkl: TF-IDF ile kelime temsilleri
+📤 Kaggle Submission
+Model, test.csv üzerinde tahmin yaparak submission.csv dosyasını üretir. Bu dosya doğrudan Kaggle'a yüklenebilir.📌 Geliştirilebilirlik
+Daha güçlü NLP modelleri (BERT, DeBERTa)
+Ensemble yaklaşımlar
+Tokenizer bazlı embedding’ler
+LSTM/Transformer tabanlı derin modeller
+🧑‍🎓 Amaç
+Bu proje, gerçek bir yarışmanın sadeleştirilmiş bir çözümünü anlamak, NLP modelleme sürecini öğrenmek ve üretilebilir bir prototip oluşturmak amacıyla geliştirilmiştir.
+🏷️ Lisans
+MIT License

app.py ADDED Viewed

	@@ -0,0 +1,33 @@

+# app.py
+import streamlit as st
+import pandas as pd
+import numpy as np
+import joblib
+from sklearn.feature_extraction.text import TfidfVectorizer
+from sklearn.linear_model import Ridge
+from sklearn.multioutput import MultiOutputRegressor
+# Başlık
+st.title("📝 English Essay Skill Predictor")
+st.markdown("Yazınızı girin, 6 dil puanını tahmin edelim (cohesion, syntax, etc.)")
+# Kullanıcıdan metin al
+user_text = st.text_area("✍️ Kompozisyonunuzu buraya yazın", height=250)
+# Model ve TF-IDF yükleme (önceden eğitilmiş)
+model = joblib.load("ridge_model.pkl")
+tfidf = joblib.load("tfidf_vectorizer.pkl")
+# Tahmin butonu
+if st.button("📊 Tahmin Et"):
+    if user_text.strip() == "":
+        st.warning("Lütfen bir yazı girin.")
+    else:
+        # Vektörleştir
+        text_vec = tfidf.transform([user_text])
+        preds = model.predict(text_vec)[0]
+        # Sonuçları göster
+        labels = ['Cohesion', 'Syntax', 'Vocabulary', 'Phraseology', 'Grammar', 'Conventions']
+        for label, score in zip(labels, preds):
+            st.write(f"**{label}**: {round(score, 2)} / 5")

project_description.txt ADDED Viewed

	@@ -0,0 +1,69 @@

+PROJECT TITLE: Feedback Prize - English Language Learning (Simplified Kaggle Project)
+ASSIGNMENT OBJECTIVE: Choose a Kaggle competition, process the data, build a machine learning model, and visualize the results.
+SELECTED COMPETITION:
+Kaggle Challenge: https://www.kaggle.com/competitions/feedback-prize-english-language-learning
+The goal of this competition is to predict 6 language proficiency scores from student-written essays:
+1. Cohesion
+2. Syntax
+3. Vocabulary
+4. Phraseology
+5. Grammar
+6. Conventions
+---
+STEPS COMPLETED:
+1. DATA LOADING AND EXPLORATION
+   - Loaded the `train.csv` file.
+   - Inspected the content: student essays (`full_text`) and 6 target scores.
+   - Explored text length distributions and score histograms.
+2. TEXT PROCESSING (Vectorization)
+   - Used `TfidfVectorizer` from scikit-learn to convert essays into numerical format.
+   - Removed English stopwords and limited features to 10,000.
+3. MODEL TRAINING
+   - Chose Ridge Regression (with L2 regularization).
+   - Used `MultiOutputRegressor` to predict all 6 scores simultaneously.
+   - Split data into training and validation sets (80% / 20%).
+   - Achieved a validation RMSE: **0.5632**
+4. TEST PREDICTIONS AND SUBMISSION
+   - Applied TF-IDF on `test.csv` and made predictions.
+   - Created a `submission.csv` file matching the `sample_submission.csv` format for Kaggle.
+5. STREAMLIT USER INTERFACE
+   - Built `app.py` with Streamlit to accept custom essays and predict scores.
+   - Used saved model: `ridge_model.pkl`
+   - Used saved TF-IDF vectorizer: `tfidf_vectorizer.pkl`
+6. INCLUDED FILES
+   - `requirements.txt` includes all Python dependencies.
+   - Saved trained model and vectorizer as `.pkl` files.
+   - Project folder is ready for GitHub or ZIP submission.
+---
+LIBRARIES USED:
+- pandas
+- numpy
+- scikit-learn
+- joblib
+- streamlit
+---
+PROJECT SUMMARY:
+In this project:
+- A real Kaggle NLP competition was selected.
+- All key ML stages were covered: data cleaning, feature extraction, modeling, prediction, evaluation, and web UI.
+- The project serves as both a practical learning experience and a simplified working prototype for multi-target regression in natural language processing.
+---
+COMPLETED BY: [Enter Your Name]
+SUBMISSION DATE: [Enter Date]

requirements.txt CHANGED Viewed

@@ -1,3 +1,5 @@
-altair
-pandas
-streamlit

+streamlit
+pandas
+scikit-learn
+joblib
+numpy

ridge_model.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:68c068bc0a684d581f4c350662cca089f2b7126a79c2ede0412fe075778b6743
+size 481432

tfidf_vectorizer.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa30b4afda71944c53a5f76c65fea2c987763fbf43bb796f22ba328e5a5dce07
+size 371125