File size: 556 Bytes
fed1832
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
## Language Specific Neuron SLA

This is done specifically for the Qwen2.5 family of models

## Guide

1. Run `load_data.py` to fetch data from https://huggingface.co/datasets/wikimedia/wikipedia/viewer/20231101
2. Calculate the activation from the fetched data with `activation.py`
3. Identify language specific neurons with `identify.py`

## Ref
- https://github.com/ReML-AI/DCL-CoT
- https://github.com/RUCAIBox/Language-Specific-Neurons

## Note taking
python3 load_data_oscar.py --languages en,zh,eu,ga --model-id qwen2.5 --tokenizer Qwen/Qwen2.5-0.5B