Commit History

Add eval_loss and perplexity params to notify_aggregator signature
07200fa

Zeke Long commited on

Merge pull request #1 from madch3m/madeleine/metrics
4ee511b
unverified

Devin commited on

Add eval loss and perplexity metrics with publication quality plots
4bde6e2

madskirk87 commited on

Refresh contributors
e02392f

Zeke Long commited on

Update gradio frontend to allow interactive function in the UI
738725a

Zeke Long commited on

Updated the fed average implementation to be layer aware
95ba784

Zeke Long commited on

Maintain history and information from gradients across sessions for each round
22c2c17

Zeke Long commited on

removing flash attention
6df923f

Zeke Long commited on

fix the flash attention hanging
c11dae9

Zeke Long commited on

updating the meomry allocation and config for flash attention to work with a100
424ef9b

Zeke Long commited on

Updated and increase speed of hf downloads for model
7ea95dd

Zeke Long commited on

added constants to notebook so it can easily be started
2ecdc14

Zeke Long commited on

Updated the space so that it can maintian a consistent data between updates and they are persisted
46c9c7e

Zeke Long commited on

Updating to adress the bottleneck in the poll for next roun
0d4e1b9

Zeke Long commited on

Fix aggregator client url
ecec64b

Zeke Long commited on

normalizing the url so that it does not get mismatched
4e0124e

Zeke Long commited on

fixed aggregator and submission issue
be27548

Zeke Long commited on

update the configuration so we can see dashboard visualization
b38ba84

Zeke Long commited on

fix(deps): pin Starlette <1 for Gradio 4.44 dashboard
f94768d

Zeke Long commited on

chore(requirements): add upper bounds for runtime stability
2d1157e

Zeke Long commited on

feat(aggregator): optional ADMIN_SECRET and STATUS_READ_SECRET
c0326b6

Zeke Long commited on

fix(notebook): catch AggregatorMergeFailed in round_end_sync
2c06bc9

Zeke Long commited on

test: guard rate-bucket reset; chore: add .dockerignore
5a6afb4

Zeke Long commited on

ci: Docker build and pytest on push and PR
aabb518

Zeke Long commited on

docs(README): operator runbook and hardening endpoints
2d9a179

Zeke Long commited on

feat(client): raise on merge_failed and add health_aggregator()
64bfe9a

Zeke Long commited on

feat(aggregator): structured logging for security and merge events
cfc60ec

Zeke Long commited on

feat(aggregator): validate /submit payload and round_num
48d74a7

Zeke Long commited on

feat(aggregator): rate limit POST /submit per client
b49eae2

Zeke Long commited on

feat(aggregator): add GET /health liveness probe
0177bb7

Zeke Long commited on

Update the application code as there was and issue with the space
4a85747

Zeke Long commited on

fixed basemodel and pin
8c0db55

Zeke Long commited on

Merge hf/main into main (keep local notebook; HF had duplicate tail)
a321edb

Zeke Long commited on

updated repo condition for first node
d57e13a

Zeke Long commited on

Updated the requirements to reduce size of gradio space
8f260d9

Zeke Long commited on

Missed the repository import
20fb1ac

Dev-the-dev91 commited on

Fix import for repo
b90818a

Zeke Long commited on

Added entry point for the hugging space to start the application
46d0cff

Dev-the-dev91 commited on

down graded huggingface version
1d8c27a

Dev-the-dev91 commited on

Switch Space SDK from gradio to docker
eb4ae95

Dev-the-dev91 commited on

Add Dockerfile to pin Python 3.10 for Gradio 4.44.1 compatibility
6e0129a

Dev-the-dev91 commited on

Remove gradio from requirements to fix HF Space build conflict
c2c2303

Dev-the-dev91 commited on

Fix Gradio version for Python 3.13 compatibility
3b06182

Dev-the-dev91 commited on

Fix Python compatibility issue
f0f8be3

Dev-the-dev91 commited on

Updated the dashboard to show progress in the space
bce2b9a

Dev-the-dev91 commited on

updated readme to have better instructions for the aggregator
ce6359d

Dev-the-dev91 commited on

Initial repo changes for scaffolding
ecca7bf

Dev-the-dev91 commited on