namer / AGENTS.md
Edwin Jose Palathinkal
Add AGENTS.md with publishing procedures for GitHub and HuggingFace
d46a06a
# Agent Publishing Guide
This document explains how to publish changes to both GitHub and HuggingFace repositories.
## Repository Structure
This project is mirrored on two platforms:
| Platform | Remote Name | URL | Purpose |
|----------|-------------|-----|---------|
| **GitHub** | `origin` | `git@github.com:edwinhere/namer.git` | Source code, issues, development |
| **HuggingFace** | `hf` | `https://huggingface.co/edwinhere/namer` | Model distribution, inference API |
## Initial Setup
```bash
# Clone from GitHub (primary development repo)
git clone git@github.com:edwinhere/namer.git
cd namer
# Add HuggingFace as a second remote
git remote add hf https://huggingface.co/edwinhere/namer
# Verify remotes
git remote -v
```
## File Storage Configuration
Different files use different storage backends:
| File | GitHub | HuggingFace |
|------|--------|-------------|
| `*.py`, `*.md`, `*.json` | Git | Git |
| `namer_model.pt` | Git LFS | Git LFS |
| `model.safetensors` | Git LFS | Xet (via LFS pointer) |
### Git Attributes (`.gitattributes`)
```gitattributes
# For GitHub: namer_model.pt uses git-lfs
*.pt filter=lfs diff=lfs merge=lfs -text
# For HuggingFace: model.safetensors uses Xet for faster downloads
model.safetensors filter=lfs diff=lfs merge=lfs -text
```
## Publishing Workflow
### 1. Make Changes
Edit files normally, then commit:
```bash
git add <files>
git commit -m "Description of changes"
```
### 2. Push to GitHub (Origin)
```bash
git push origin main
```
This uploads:
- All code files to GitHub
- LFS objects (`namer_model.pt`, `model.safetensors`)
### 3. Push to HuggingFace
```bash
GIT_LFS_SKIP_SMUDGE=1 git push hf main
```
**Why `GIT_LFS_SKIP_SMUDGE=1`?**
- HuggingFace uses **Xet** storage for `model.safetensors` (faster than LFS)
- Without this flag, git tries to download LFS objects from HF that may not exist
- The flag skips the smudge filter, pushing only the LFS pointer file
- HF's Xet backend then serves the actual file content
### 4. Upload Safetensors to HuggingFace (if updated)
If `model.safetensors` changed, use the HF CLI for Xet upload:
```bash
# Upload via HF CLI (uses Xet for fast transfers)
hf upload edwinhere/namer model.safetensors model.safetensors \
--commit-message "Update model weights vX.Y"
```
## Complete Publishing Example
```bash
# 1. Make changes to code
cd /big/home/edwin/dev/namer
vim namer/data.py
# 2. Commit
git add namer/data.py
git commit -m "Fix edge case handling for numbers with many zeros"
# 3. Push code to both platforms
git push origin main
GIT_LFS_SKIP_SMUDGE=1 git push hf main
# 4. If model weights changed, upload safetensors
hf upload edwinhere/namer model.safetensors model.safetensors \
--commit-message "Update model weights with improved training"
```
## One-Line Push to Both
For convenience, push to both in one command:
```bash
git push origin main && GIT_LFS_SKIP_SMUDGE=1 git push hf main
```
Or with explicit checks:
```bash
# Push to GitHub
git push origin main
# Push to HuggingFace (skip LFS smudge to avoid download issues)
GIT_LFS_SKIP_SMUDGE=1 git push hf main
```
## Troubleshooting
### Diverged Branches
If `hf` and `origin` have diverged:
```bash
# Pull from HuggingFace first (skipping LFS downloads)
GIT_LFS_SKIP_SMUDGE=1 git pull hf main --rebase
# Then push back
git push origin main
GIT_LFS_SKIP_SMUDGE=1 git push hf main
```
### LFS Object Not Found
If you see "Object does not exist on the server":
```bash
# Skip smudge filter to avoid downloading missing objects
GIT_LFS_SKIP_SMUDGE=1 git pull hf main
```
### Force Push (Use with Caution)
If history was rewritten and you need to force sync:
```bash
# Force push to GitHub
git push origin main --force-with-lease
# Force push to HuggingFace
GIT_LFS_SKIP_SMUDGE=1 git push hf main --force-with-lease
```
## Verification
After publishing, verify on both platforms:
```bash
# Check GitHub latest commit
git log origin/main --oneline -3
# Check HuggingFace latest commit
git log hf/main --oneline -3
# Both should show the same commits
```
## Model Files Reference
| File | Size | Purpose | Platform |
|------|------|---------|----------|
| `namer_model.pt` | 3.6 MB | PyTorch checkpoint (training/inference) | GitHub (LFS) |
| `model.safetensors` | 3.5 MB | Safetensors format (HF compatible) | HuggingFace (Xet) |
## Commands Cheat Sheet
```bash
# View remotes
git remote -v
# View LFS tracked files
git lfs ls-files
# View LFS files with sizes
git lfs ls-files --size
# Check status on both platforms
git fetch origin && git fetch hf
git log --oneline --graph --decorate --all -5
# Push to both
git push origin main && GIT_LFS_SKIP_SMUDGE=1 git push hf main
```
## Badges (Cross-Platform Links)
The README maintains badges linking both platforms:
```markdown
[![HuggingFace](https://img.shields.io/badge/🤗_HuggingFace-Model_Card-yellow)](https://huggingface.co/edwinhere/namer)
[![GitHub](https://img.shields.io/badge/🐙_GitHub-Source_Code-blue)](https://github.com/edwinhere/namer)
```
These should always point to each other regardless of which platform the user is viewing from.