Spaces:
Sleeping
Sleeping
| title: "Dacon Broadcast Article Performance Predictor" | |
| emoji: "📰" | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| app_port: 7860 | |
| short_description: "AI-powered article KPI predictions and SEO recommendations" | |
| tags: | |
| - flask | |
| - seo | |
| - analytics | |
| - journalism | |
| datasets: [] | |
| models: [] | |
| suggested_hardware: cpu-upgrade | |
| suggested_storage: medium | |
| pinned: false | |
| # Dacon Broadcast Article Performance Predictor | |
| This project hosts a Flask web application that predicts article performance and provides AI-powered SEO recommendations. | |
| ## Local development | |
| 1. Create a virtual environment and install dependencies. | |
| ```powershell | |
| python -m venv .venv | |
| .\.venv\Scripts\Activate.ps1 | |
| pip install -r requirements.txt | |
| ``` | |
| 2. Ensure the model artifacts are generated: | |
| ```powershell | |
| .\.venv\Scripts\python.exe train_and_save_models.py | |
| ``` | |
| 3. Add your Google Generative AI key to a `.env` file: | |
| ```ini | |
| GEMINI_API_KEY=your-api-key | |
| ``` | |
| 4. Run the development server: | |
| ```powershell | |
| .\.venv\Scripts\python.exe app.py | |
| ``` | |
| ## Production deployment (Gunicorn + Nginx) | |
| 1. **Copy project to server** (e.g., `/srv/dacon_broadcast_paper`). | |
| 2. **Create virtual environment** and install requirements as above. | |
| 3. **Generate artifacts** on the server or copy them from local build. | |
| 4. **Configure environment variables**: | |
| ```bash | |
| echo "GEMINI_API_KEY=your-api-key" | sudo tee /etc/dacon_app.env | |
| ``` | |
| 5. **Test Gunicorn manually**: | |
| ```bash | |
| cd /srv/dacon_broadcast_paper | |
| source .venv/bin/activate | |
| gunicorn --bind 127.0.0.1:8000 --workers 3 --timeout 120 wsgi:application | |
| ``` | |
| ### systemd service | |
| Use `deploy/dacon_app.service` as a template: | |
| ```bash | |
| sudo cp deploy/dacon_app.service /etc/systemd/system/dacon_app.service | |
| sudo systemctl daemon-reload | |
| sudo systemctl enable dacon_app | |
| sudo systemctl start dacon_app | |
| sudo systemctl status dacon_app | |
| ``` | |
| Adjust `WorkingDirectory`, `ExecStart`, and `Environment` entries to match your server paths or reference `/etc/dacon_app.env` with `EnvironmentFile=` if preferred. | |
| ### Nginx reverse proxy | |
| 1. Install Nginx (`sudo apt install nginx`). | |
| 2. Copy the provided config: | |
| ```bash | |
| sudo cp deploy/dacon_app.nginx.conf /etc/nginx/sites-available/dacon_app | |
| sudo ln -s /etc/nginx/sites-available/dacon_app /etc/nginx/sites-enabled/ | |
| sudo nginx -t | |
| sudo systemctl reload nginx | |
| ``` | |
| 3. Update `server_name` and any path aliases before reloading. | |
| 4. (Optional) Enable HTTPS via Certbot: | |
| ```bash | |
| sudo apt install certbot python3-certbot-nginx | |
| sudo certbot --nginx -d your-domain.com | |
| ``` | |
| ### Firewall and health checks | |
| - Open ports 80/443 via `ufw` or your cloud provider’s security group. | |
| - Use the `/healthz` endpoint for health monitoring. | |
| - Logs: | |
| - Application: `journalctl -u dacon_app` | |
| - Nginx: `/var/log/nginx/access.log`, `/var/log/nginx/error.log` | |
| ## File overview | |
| - `app.py` – Flask application with prediction and SEO endpoints. | |
| - `wsgi.py` – WSGI entrypoint for production servers. | |
| - `deploy/dacon_app.service` – sample systemd unit for Gunicorn. | |
| - `deploy/dacon_app.nginx.conf` – sample Nginx reverse proxy configuration. | |
| - `train_and_save_models.py` – pipeline that creates required artifacts. | |
| - `data_csv/` – CSV inputs used by the app. | |
| ## Troubleshooting | |
| - If Gunicorn crashes, check for missing artifacts under `artifacts/`. | |
| - Ensure the `.env` file or environment variables include `GEMINI_API_KEY`. | |
| - Increase `client_max_body_size` in Nginx if large payloads are expected. | |
| - For Windows hosting, consider running Gunicorn/Nginx via WSL2 or using IIS + FastCGI with `wsgi.py`. | |
| ## Hugging Face Spaces deployment (Docker Space) | |
| Hugging Face Spaces support custom web apps through Docker. Use the provided `Dockerfile` to containerize the app and expose it via Gunicorn. | |
| 1. **Prepare the repository** | |
| - Ensure all required artifacts (`*.pkl`) and the `data_csv/` folder are committed (Spaces pull the repo directly). | |
| - Keep individual files under 1 GB (Spaces limit); use Git LFS for large artifacts if needed. | |
| 2. **Create a new Space** | |
| - On Hugging Face, click **Create Space** → type `Docker` → name it (e.g., `username/dacon-predictor`). | |
| - Leave hardware as default unless more RAM is required (~16 GB recommended because of NLP dependencies). | |
| 3. **Push the code** | |
| - Initialize the Space as a Git repo locally: | |
| ```bash | |
| huggingface-cli repo create username/dacon-predictor --type=space --space-sdk=docker | |
| git remote add space https://huggingface.co/spaces/username/dacon-predictor | |
| git push space main | |
| ``` | |
| - Alternatively, clone the empty Space repo and copy the project files into it before pushing. | |
| 4. **Secrets & configuration** | |
| - In the Space settings, add a secret named `GEMINI_API_KEY` with your Google Generative AI key. | |
| - Optional: set `GUNICORN_WORKERS` to tune concurrency. | |
| 5. **Container build** | |
| - Spaces will build the `Dockerfile`. It installs system deps (OpenJDK, MeCab) and Python requirements, then launches Gunicorn binding to `$PORT` (HF uses port 7860 by default). | |
| - The app serves `index.html` via Flask, so no additional frontend wiring is required. | |
| 6. **Testing & monitoring** | |
| - Once the build finishes, open the Space URL to verify predictions and SEO generation. | |
| - Check the Space logs (Settings → Logs) for build/runtime issues, especially MeCab/Java errors. | |
| ### Space-specific tips | |
| - **Cold start latency**: Spaces sleep when idle; first request may take longer as the model artifacts load. | |
| - **Resource usage**: If memory spikes occur (pandas + scikit-learn + MeCab), upgrade to a larger hardware tier. | |
| - **Background tasks**: This setup serves HTTP requests only; long-running offline jobs should be run outside Spaces. | |
| - **Security**: Secrets set in HF UI aren’t exposed in the repo. Avoid committing `.env` with real keys. | |
| - **Custom domains**: Hugging Face supports domain mapping on paid tiers if you need branding. | |