Improve model card: Add `library_name`, abstract, and GitHub details
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,25 +1,47 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
datasets:
|
| 4 |
- Ashenone3/LM-Searcher-Trajectory-228K
|
| 5 |
language:
|
| 6 |
- en
|
|
|
|
| 7 |
metrics:
|
| 8 |
- accuracy
|
| 9 |
-
base_model:
|
| 10 |
-
- meta-llama/Llama-3.1-8B
|
| 11 |
pipeline_tag: text-generation
|
| 12 |
tags:
|
| 13 |
- nas
|
| 14 |
- optimization
|
| 15 |
- agent
|
|
|
|
| 16 |
---
|
|
|
|
| 17 |
# LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding
|
| 18 |
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
## Usage
|
| 25 |
|
|
@@ -31,7 +53,7 @@ vllm serve path-to-the-checkpoint --dtype auto --api-key token-abc123 --chat-tem
|
|
| 31 |
```
|
| 32 |
|
| 33 |
### Inference
|
| 34 |
-
|
| 35 |
```python
|
| 36 |
import os
|
| 37 |
import re
|
|
@@ -99,4 +121,16 @@ for iteration in range(num_iters, args.trial_num):
|
|
| 99 |
# Save all historical results to file
|
| 100 |
with open('{}/historical_results.json'.format(args.output_dir), 'w') as f:
|
| 101 |
json.dump(trial_dict, f)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
```
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- meta-llama/Llama-3.1-8B
|
| 4 |
datasets:
|
| 5 |
- Ashenone3/LM-Searcher-Trajectory-228K
|
| 6 |
language:
|
| 7 |
- en
|
| 8 |
+
license: apache-2.0
|
| 9 |
metrics:
|
| 10 |
- accuracy
|
|
|
|
|
|
|
| 11 |
pipeline_tag: text-generation
|
| 12 |
tags:
|
| 13 |
- nas
|
| 14 |
- optimization
|
| 15 |
- agent
|
| 16 |
+
library_name: transformers
|
| 17 |
---
|
| 18 |
+
|
| 19 |
# LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding
|
| 20 |
|
| 21 |
+
Paper: [LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding](https://huggingface.co/papers/2509.05657)
|
| 22 |
+
|
| 23 |
+
## Abstract
|
| 24 |
+
Recent progress in Large Language Models (LLMs) has opened new avenues for solving complex optimization problems, including Neural Architecture Search (NAS). However, existing LLM-driven NAS approaches rely heavily on prompt engineering and domain-specific tuning, limiting their practicality and scalability across diverse tasks. In this work, we propose LM-Searcher, a novel framework that leverages LLMs for cross-domain neural architecture optimization without the need for extensive domain-specific adaptation. Central to our approach is NCode, a universal numerical string representation for neural architectures, which enables cross-domain architecture encoding and search. We also reformulate the NAS problem as a ranking task, training LLMs to select high-performing architectures from candidate pools using instruction-tuning samples derived from a novel pruning-based subspace sampling strategy. Our curated dataset, encompassing a wide range of architecture-performance pairs, encourages robust and transferable learning. Comprehensive experiments demonstrate that LM-Searcher achieves competitive performance in both in-domain (e.g., CNNs for image classification) and out-of-domain (e.g., LoRA configurations for segmentation and generation) tasks, establishing a new paradigm for flexible and generalizable LLM-based architecture search. The datasets and models will be released at this https URL .
|
| 25 |
+
|
| 26 |
+
## GitHub Repository
|
| 27 |
+
Code: [https://github.com/Ashone3/LM-Searcher](https://github.com/Ashone3/LM-Searcher)
|
| 28 |
|
| 29 |
+
<br>
|
| 30 |
+
<div align="center">
|
| 31 |
+
<img src="https://github.com/Ashone3/LM-Searcher/raw/main/figures/lm_searcher_fig2.png" width="100%" title="Figure2">
|
| 32 |
+
</div>
|
| 33 |
+
|
| 34 |
+
## Datasets and Models
|
| 35 |
+
🤗 [LM-Searcher-Trajectory-228k Dataset](https://huggingface.co/datasets/Ashenone3/LM-Searcher-Trajectory-228K)
|
| 36 |
+
|
| 37 |
+
🤗 [LM-Searcher Checkpoint](https://huggingface.co/Ashenone3/LM-Searcher/tree/main)
|
| 38 |
+
|
| 39 |
+
## Training
|
| 40 |
+
|
| 41 |
+
We leverage [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) to train LM-Searcher. Below is the script we use for full fine-tuning on the LLaMA-3.1 model:
|
| 42 |
+
```shell
|
| 43 |
+
FORCE_TORCHRUN=1 llamafactory-cli train configs/llama3_full_sft_ds2.yaml
|
| 44 |
+
```
|
| 45 |
|
| 46 |
## Usage
|
| 47 |
|
|
|
|
| 53 |
```
|
| 54 |
|
| 55 |
### Inference
|
| 56 |
+
A minimal example, `search.py`, is provided to show how LM-Searcher can be used to search for the optimal solution to a given problem:
|
| 57 |
```python
|
| 58 |
import os
|
| 59 |
import re
|
|
|
|
| 121 |
# Save all historical results to file
|
| 122 |
with open('{}/historical_results.json'.format(args.output_dir), 'w') as f:
|
| 123 |
json.dump(trial_dict, f)
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
## Citation
|
| 127 |
+
If our work has been helpful to you, please consider citing it. Your citation serves as encouragement for our research.
|
| 128 |
+
|
| 129 |
+
```bibtex
|
| 130 |
+
@article{luo2024lmsearcher,
|
| 131 |
+
title={LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding},
|
| 132 |
+
author={Luo, Junyu and Luo, Xiao and Chen, Xiusi and Xiao, Zhiping and Ju, Wei and Zhang, Ming},
|
| 133 |
+
journal={arXiv preprint arXiv:2509.05657},
|
| 134 |
+
year={2024}
|
| 135 |
+
}
|
| 136 |
```
|