Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Commit ·
a55cd01
1
Parent(s): 5477bad
update agents.md file with guidelines for bibliography
Browse files
AGENTS.md
CHANGED
|
@@ -76,6 +76,30 @@ Use these blog posts as inspiration for writing style:
|
|
| 76 |
- Deep-link anything with `id` prop on `Image`, `HtmlEmbed`, `Reference` components
|
| 77 |
- Link format: `[Figure 1](#my-figure-id)`
|
| 78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
## Components Quick Reference
|
| 80 |
|
| 81 |
| Component | Use for |
|
|
@@ -145,13 +169,18 @@ After completing any content edit, perform this reference audit:
|
|
| 145 |
|
| 146 |
1. **Scan for uncited claims**: Look for statements about methods, benchmarks, or prior work that lack citations
|
| 147 |
2. **Search for BibTeX entries**: For each missing reference, search online (Google Scholar, Semantic Scholar, arXiv) for the correct BibTeX entry
|
| 148 |
-
3. **Add to bibliography**: Place new entries in `app/src/content/bibliography.bib`
|
| 149 |
-
4. **Insert citations**: Add `[@key]` inline
|
| 150 |
5. **Ask if unsure**: If multiple papers match or authorship is ambiguous, ask the user which reference is correct before adding
|
| 151 |
|
| 152 |
**What needs citations:**
|
| 153 |
-
- Model names (
|
| 154 |
-
- Datasets (
|
| 155 |
-
- Methods and techniques (
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 156 |
- Benchmark results or comparisons
|
| 157 |
- Claims about prior work ("X showed that...")
|
|
|
|
| 76 |
- Deep-link anything with `id` prop on `Image`, `HtmlEmbed`, `Reference` components
|
| 77 |
- Link format: `[Figure 1](#my-figure-id)`
|
| 78 |
|
| 79 |
+
### Bibliography Organization
|
| 80 |
+
|
| 81 |
+
The bibliography file is organized into these sections (in order). Always place new entries in the correct section:
|
| 82 |
+
|
| 83 |
+
| Section | Comment header | What belongs here |
|
| 84 |
+
|---------|---------------|-------------------|
|
| 85 |
+
| Datasets | `% Datasets` | Training/eval datasets (FineWeb, DCLM, Cosmopedia, s1K, etc.) |
|
| 86 |
+
| Synthetic data methods | `% Synthetic data methods` | Papers about data generation/rephrasing (WRAP, REWIRE, BeyondWeb, model collapse) |
|
| 87 |
+
| Models | `% Models` | LLM papers and technical reports (Qwen, Llama, Gemma, SmolLM, etc.) |
|
| 88 |
+
| Architecture | `% Architecture` | Architectural components (GQA, RoPE, etc.) |
|
| 89 |
+
| Inference | `% Inference` | Serving/inference engines and techniques (vLLM, SGLang, FlashAttention, speculative decoding) |
|
| 90 |
+
| Training | `% Training` | Optimizers, schedules, training methods (AdamW, WSD/MiniCPM, etc.) |
|
| 91 |
+
| Tools | `% Tools` | Software tools and frameworks (DSPy, DataTrove, etc.) |
|
| 92 |
+
| Benchmarks | `% Benchmarks` | Evaluation benchmarks (ARC, HellaSwag, MMLU, GSM8K, etc.) |
|
| 93 |
+
|
| 94 |
+
### Citation Placement Rules
|
| 95 |
+
|
| 96 |
+
- **Cite on first occurrence only**: Place `[@key]` the first time a paper/model/dataset/tool appears in the blog post
|
| 97 |
+
- **Blog chapter order matters**: The rendering order is defined in `app/src/content/article.mdx` (Introduction → Infrastructure → Setup → Experiments → Conclusions → Appendix). "First occurrence" means first across the entire concatenated blog, not first within a single chapter file
|
| 98 |
+
- **Everything citable gets cited**: Models, datasets, benchmarks, architectural techniques, optimizers, inference engines, training schedules, and tools all need citations if they have a paper or official reference
|
| 99 |
+
- **Hyperlinks are not citations**: Even if something is already linked (e.g., `[DataTrove](https://github.com/...)`) it still needs a `[@key]` citation
|
| 100 |
+
- **Use `@software` for code-only references**: Libraries/tools without a paper get a `@software{...}` entry pointing to the repo or blog post
|
| 101 |
+
- **Use `@misc` with `note = {Blog post}`** for references that only exist as blog posts
|
| 102 |
+
|
| 103 |
## Components Quick Reference
|
| 104 |
|
| 105 |
| Component | Use for |
|
|
|
|
| 169 |
|
| 170 |
1. **Scan for uncited claims**: Look for statements about methods, benchmarks, or prior work that lack citations
|
| 171 |
2. **Search for BibTeX entries**: For each missing reference, search online (Google Scholar, Semantic Scholar, arXiv) for the correct BibTeX entry
|
| 172 |
+
3. **Add to bibliography**: Place new entries in `app/src/content/bibliography.bib` **in the correct section** (see Bibliography Organization above). Do not dump entries at the end of the file
|
| 173 |
+
4. **Insert citations**: Add `[@key]` inline at the first occurrence across the whole blog (respecting chapter rendering order from `article.mdx`)
|
| 174 |
5. **Ask if unsure**: If multiple papers match or authorship is ambiguous, ask the user which reference is correct before adding
|
| 175 |
|
| 176 |
**What needs citations:**
|
| 177 |
+
- Model names and families (Qwen, Llama, Gemma, SmolLM, etc.)
|
| 178 |
+
- Datasets (FineWeb, DCLM, Cosmopedia, s1K, etc.)
|
| 179 |
+
- Methods and techniques (speculative decoding, RLHF, etc.)
|
| 180 |
+
- Architectural components (GQA, RoPE, Flash Attention, etc.)
|
| 181 |
+
- Training details (AdamW optimizer, WSD learning rate schedule, etc.)
|
| 182 |
+
- Inference engines (vLLM, SGLang, FlashInfer, etc.)
|
| 183 |
+
- Evaluation benchmarks (ARC, HellaSwag, MMLU, GSM8K, etc.)
|
| 184 |
+
- Tools and frameworks (DataTrove, DSPy, etc.)
|
| 185 |
- Benchmark results or comparisons
|
| 186 |
- Claims about prior work ("X showed that...")
|