Spaces:

mutoy
/

Broadcast_paper

Sleeping

App Files Files Community

Broadcast_paper / README.md

Choi jun hyeok

update README

d254980 3 months ago

preview code

raw

history blame contribute delete

5.99 kB

	---
	title: "Dacon Broadcast Article Performance Predictor"
	emoji: "📰"
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 7860
	short_description: "AI-powered article KPI predictions and SEO recommendations"
	tags:
	- flask
	- seo
	- analytics
	- journalism
	datasets: []
	models: []
	suggested_hardware: cpu-upgrade
	suggested_storage: medium
	pinned: false
	---

	# Dacon Broadcast Article Performance Predictor

	This project hosts a Flask web application that predicts article performance and provides AI-powered SEO recommendations.

	## Local development

	1. Create a virtual environment and install dependencies.
	```powershell
	python -m venv .venv
	.\.venv\Scripts\Activate.ps1
	pip install -r requirements.txt
	```
	2. Ensure the model artifacts are generated:
	```powershell
	.\.venv\Scripts\python.exe train_and_save_models.py
	```
	3. Add your Google Generative AI key to a `.env` file:
	```ini
	GEMINI_API_KEY=your-api-key
	```
	4. Run the development server:
	```powershell
	.\.venv\Scripts\python.exe app.py
	```

	## Production deployment (Gunicorn + Nginx)

	1. Copy project to server (e.g., `/srv/dacon_broadcast_paper`).
	2. Create virtual environment and install requirements as above.
	3. Generate artifacts on the server or copy them from local build.
	4. Configure environment variables:
	```bash
	echo "GEMINI_API_KEY=your-api-key" \| sudo tee /etc/dacon_app.env
	```
	5. Test Gunicorn manually:
	```bash
	cd /srv/dacon_broadcast_paper
	source .venv/bin/activate
	gunicorn --bind 127.0.0.1:8000 --workers 3 --timeout 120 wsgi:application
	```

	### systemd service

	Use `deploy/dacon_app.service` as a template:
	```bash
	sudo cp deploy/dacon_app.service /etc/systemd/system/dacon_app.service
	sudo systemctl daemon-reload
	sudo systemctl enable dacon_app
	sudo systemctl start dacon_app
	sudo systemctl status dacon_app
	```
	Adjust `WorkingDirectory`, `ExecStart`, and `Environment` entries to match your server paths or reference `/etc/dacon_app.env` with `EnvironmentFile=` if preferred.

	### Nginx reverse proxy

	1. Install Nginx (`sudo apt install nginx`).
	2. Copy the provided config:
	```bash
	sudo cp deploy/dacon_app.nginx.conf /etc/nginx/sites-available/dacon_app
	sudo ln -s /etc/nginx/sites-available/dacon_app /etc/nginx/sites-enabled/
	sudo nginx -t
	sudo systemctl reload nginx
	```
	3. Update `server_name` and any path aliases before reloading.
	4. (Optional) Enable HTTPS via Certbot:
	```bash
	sudo apt install certbot python3-certbot-nginx
	sudo certbot --nginx -d your-domain.com
	```

	### Firewall and health checks

	- Open ports 80/443 via `ufw` or your cloud provider’s security group.
	- Use the `/healthz` endpoint for health monitoring.
	- Logs:
	- Application: `journalctl -u dacon_app`
	- Nginx: `/var/log/nginx/access.log`, `/var/log/nginx/error.log`

	## File overview

	- `app.py` – Flask application with prediction and SEO endpoints.
	- `wsgi.py` – WSGI entrypoint for production servers.
	- `deploy/dacon_app.service` – sample systemd unit for Gunicorn.
	- `deploy/dacon_app.nginx.conf` – sample Nginx reverse proxy configuration.
	- `train_and_save_models.py` – pipeline that creates required artifacts.
	- `data_csv/` – CSV inputs used by the app.

	## Troubleshooting

	- If Gunicorn crashes, check for missing artifacts under `artifacts/`.
	- Ensure the `.env` file or environment variables include `GEMINI_API_KEY`.
	- Increase `client_max_body_size` in Nginx if large payloads are expected.
	- For Windows hosting, consider running Gunicorn/Nginx via WSL2 or using IIS + FastCGI with `wsgi.py`.

	## Hugging Face Spaces deployment (Docker Space)

	Hugging Face Spaces support custom web apps through Docker. Use the provided `Dockerfile` to containerize the app and expose it via Gunicorn.

	1. Prepare the repository
	- Ensure all required artifacts (`*.pkl`) and the `data_csv/` folder are committed (Spaces pull the repo directly).
	- Keep individual files under 1 GB (Spaces limit); use Git LFS for large artifacts if needed.

	2. Create a new Space
	- On Hugging Face, click Create Space → type `Docker` → name it (e.g., `username/dacon-predictor`).
	- Leave hardware as default unless more RAM is required (~16 GB recommended because of NLP dependencies).

	3. Push the code
	- Initialize the Space as a Git repo locally:
	```bash
	huggingface-cli repo create username/dacon-predictor --type=space --space-sdk=docker
	git remote add space https://huggingface.co/spaces/username/dacon-predictor
	git push space main
	```
	- Alternatively, clone the empty Space repo and copy the project files into it before pushing.

	4. Secrets & configuration
	- In the Space settings, add a secret named `GEMINI_API_KEY` with your Google Generative AI key.
	- Optional: set `GUNICORN_WORKERS` to tune concurrency.

	5. Container build
	- Spaces will build the `Dockerfile`. It installs system deps (OpenJDK, MeCab) and Python requirements, then launches Gunicorn binding to `$PORT` (HF uses port 7860 by default).
	- The app serves `index.html` via Flask, so no additional frontend wiring is required.

	6. Testing & monitoring
	- Once the build finishes, open the Space URL to verify predictions and SEO generation.
	- Check the Space logs (Settings → Logs) for build/runtime issues, especially MeCab/Java errors.

	### Space-specific tips

	- Cold start latency: Spaces sleep when idle; first request may take longer as the model artifacts load.
	- Resource usage: If memory spikes occur (pandas + scikit-learn + MeCab), upgrade to a larger hardware tier.
	- Background tasks: This setup serves HTTP requests only; long-running offline jobs should be run outside Spaces.
	- Security: Secrets set in HF UI aren’t exposed in the repo. Avoid committing `.env` with real keys.
	- Custom domains: Hugging Face supports domain mapping on paid tiers if you need branding.