Broadcast_paper / README.md
Choi jun hyeok
update README
d254980
---
title: "Dacon Broadcast Article Performance Predictor"
emoji: "📰"
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
short_description: "AI-powered article KPI predictions and SEO recommendations"
tags:
- flask
- seo
- analytics
- journalism
datasets: []
models: []
suggested_hardware: cpu-upgrade
suggested_storage: medium
pinned: false
---
# Dacon Broadcast Article Performance Predictor
This project hosts a Flask web application that predicts article performance and provides AI-powered SEO recommendations.
## Local development
1. Create a virtual environment and install dependencies.
```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
```
2. Ensure the model artifacts are generated:
```powershell
.\.venv\Scripts\python.exe train_and_save_models.py
```
3. Add your Google Generative AI key to a `.env` file:
```ini
GEMINI_API_KEY=your-api-key
```
4. Run the development server:
```powershell
.\.venv\Scripts\python.exe app.py
```
## Production deployment (Gunicorn + Nginx)
1. **Copy project to server** (e.g., `/srv/dacon_broadcast_paper`).
2. **Create virtual environment** and install requirements as above.
3. **Generate artifacts** on the server or copy them from local build.
4. **Configure environment variables**:
```bash
echo "GEMINI_API_KEY=your-api-key" | sudo tee /etc/dacon_app.env
```
5. **Test Gunicorn manually**:
```bash
cd /srv/dacon_broadcast_paper
source .venv/bin/activate
gunicorn --bind 127.0.0.1:8000 --workers 3 --timeout 120 wsgi:application
```
### systemd service
Use `deploy/dacon_app.service` as a template:
```bash
sudo cp deploy/dacon_app.service /etc/systemd/system/dacon_app.service
sudo systemctl daemon-reload
sudo systemctl enable dacon_app
sudo systemctl start dacon_app
sudo systemctl status dacon_app
```
Adjust `WorkingDirectory`, `ExecStart`, and `Environment` entries to match your server paths or reference `/etc/dacon_app.env` with `EnvironmentFile=` if preferred.
### Nginx reverse proxy
1. Install Nginx (`sudo apt install nginx`).
2. Copy the provided config:
```bash
sudo cp deploy/dacon_app.nginx.conf /etc/nginx/sites-available/dacon_app
sudo ln -s /etc/nginx/sites-available/dacon_app /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
```
3. Update `server_name` and any path aliases before reloading.
4. (Optional) Enable HTTPS via Certbot:
```bash
sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d your-domain.com
```
### Firewall and health checks
- Open ports 80/443 via `ufw` or your cloud provider’s security group.
- Use the `/healthz` endpoint for health monitoring.
- Logs:
- Application: `journalctl -u dacon_app`
- Nginx: `/var/log/nginx/access.log`, `/var/log/nginx/error.log`
## File overview
- `app.py` – Flask application with prediction and SEO endpoints.
- `wsgi.py` – WSGI entrypoint for production servers.
- `deploy/dacon_app.service` – sample systemd unit for Gunicorn.
- `deploy/dacon_app.nginx.conf` – sample Nginx reverse proxy configuration.
- `train_and_save_models.py` – pipeline that creates required artifacts.
- `data_csv/` – CSV inputs used by the app.
## Troubleshooting
- If Gunicorn crashes, check for missing artifacts under `artifacts/`.
- Ensure the `.env` file or environment variables include `GEMINI_API_KEY`.
- Increase `client_max_body_size` in Nginx if large payloads are expected.
- For Windows hosting, consider running Gunicorn/Nginx via WSL2 or using IIS + FastCGI with `wsgi.py`.
## Hugging Face Spaces deployment (Docker Space)
Hugging Face Spaces support custom web apps through Docker. Use the provided `Dockerfile` to containerize the app and expose it via Gunicorn.
1. **Prepare the repository**
- Ensure all required artifacts (`*.pkl`) and the `data_csv/` folder are committed (Spaces pull the repo directly).
- Keep individual files under 1 GB (Spaces limit); use Git LFS for large artifacts if needed.
2. **Create a new Space**
- On Hugging Face, click **Create Space** → type `Docker` → name it (e.g., `username/dacon-predictor`).
- Leave hardware as default unless more RAM is required (~16 GB recommended because of NLP dependencies).
3. **Push the code**
- Initialize the Space as a Git repo locally:
```bash
huggingface-cli repo create username/dacon-predictor --type=space --space-sdk=docker
git remote add space https://huggingface.co/spaces/username/dacon-predictor
git push space main
```
- Alternatively, clone the empty Space repo and copy the project files into it before pushing.
4. **Secrets & configuration**
- In the Space settings, add a secret named `GEMINI_API_KEY` with your Google Generative AI key.
- Optional: set `GUNICORN_WORKERS` to tune concurrency.
5. **Container build**
- Spaces will build the `Dockerfile`. It installs system deps (OpenJDK, MeCab) and Python requirements, then launches Gunicorn binding to `$PORT` (HF uses port 7860 by default).
- The app serves `index.html` via Flask, so no additional frontend wiring is required.
6. **Testing & monitoring**
- Once the build finishes, open the Space URL to verify predictions and SEO generation.
- Check the Space logs (Settings → Logs) for build/runtime issues, especially MeCab/Java errors.
### Space-specific tips
- **Cold start latency**: Spaces sleep when idle; first request may take longer as the model artifacts load.
- **Resource usage**: If memory spikes occur (pandas + scikit-learn + MeCab), upgrade to a larger hardware tier.
- **Background tasks**: This setup serves HTTP requests only; long-running offline jobs should be run outside Spaces.
- **Security**: Secrets set in HF UI aren’t exposed in the repo. Avoid committing `.env` with real keys.
- **Custom domains**: Hugging Face supports domain mapping on paid tiers if you need branding.