moelanoby
/

ALM-Qwen-0.5B-testing

Model card Files Files and versions

ALM-Qwen-0.5B-testing / README.md

moelanoby's picture

Update README.md

b41e62c verified 8 months ago

|

history blame contribute delete

3.62 kB

	---
	base_model:
	- Qwen/Qwen2.5-0.5B-Instruct
	library_name: transformers
	---

	# ALM-Qwen Model: ALM-Qwen-0.5B-testing

	This repository contains an Attention-Linked Memory augmented Qwen model (ALM-Qwen).

	## Model Components

	* AttentionLinkedMemory (ALM): A custom PyTorch module for two-level attention-based retrieval from structured memory. (See `ALM.py`)
	* QwenGenerator: Wraps a Hugging Face Qwen model (e.g., Qwen2.5-0.5B-Instruct or Qwen2.5-7B-Instruct) for text generation.
	* ALMQwenModel_HF: The main class orchestrating the ALM retrieval and Qwen generation. (See `alm_qwen.py`)
	* Saved Weights & Config:
	* `alm_layer_state_dict.pth`: Trained weights for the ALM layer.
	* `alm_qwen_hf_config.json`: Configuration for the `ALMQwenModel_HF`, including ALM parameters and paths to the Qwen components.
	* `qwen_generator/`: Contains the saved Hugging Face Qwen model and tokenizer.

	## How to Use

	1. Prerequisites:
	```bash
	pip install torch transformers huggingface_hub sentencepiece accelerate
	# Add other dependencies if any, e.g., bitsandbytes for quantization
	```

	2. Clone the repository (or download files manually):
	```bash
	git lfs install # if large files are used, though typically not for these components directly
	git clone https://huggingface.co/moelanoby/ALM-Qwen-0.5B-testing
	cd ALM-Qwen-0.5B-testing
	```

	3. Load the model in Python:
	```python
	from alm_qwen import ALMQwenModel_HF # Make sure alm_qwen_hf.py and ALM.py are in your PYTHONPATH
	import torch

	# Desired device
	device = "cuda" if torch.cuda.is_available() else "cpu"

	# Path to the directory where you cloned/downloaded the model
	model_directory = "." # Or the specific path if you are running from outside the cloned repo

	# Load the model
	loaded_model = ALMQwenModel_HF.load_model(model_directory, device=device)
	print("ALM-Qwen model loaded successfully!")

	# --- Prepare Dummy Input Data (similar to the example in alm_qwen_hf.py) ---
	# batch_size = 1
	# alm_query_dim = loaded_model.alm_config['query_dim']
	# alm_memory_dim = loaded_model.alm_config['memory_dim']
	# num_kb_buckets = 3 # Example
	# max_kb_items_per_bucket = 5 # Example

	# query_texts = ["What is the capital of France?"]
	# query_embeddings_for_alm = torch.randn(batch_size, alm_query_dim)
	# memory_item_embeddings = torch.randn(batch_size, num_kb_buckets, max_kb_items_per_bucket, alm_memory_dim)
	# memory_text_items = [[["Paris is the capital of France." for _ in range(max_kb_items_per_bucket)] for _ in range(num_kb_buckets)] for _ in range(batch_size)]
	# memory_mask = torch.ones(batch_size, num_kb_buckets, max_kb_items_per_bucket, dtype=torch.bool)
	# memory_mask[:, :, -1] = False # Example mask

	# # Run inference
	# generated_answers, _, _ = loaded_model(
	# query_texts,
	# query_embeddings_for_alm,
	# memory_item_embeddings,
	# memory_text_items,
	# memory_mask
	# )
	# print(f"Query: {query_texts[0]}")
	# print(f"Answer: {generated_answers[0]}")
	```

	## Training

	The ALM layer (`alm_layer_state_dict.pth`) might have been trained. The Qwen model inside `qwen_generator/` is typically a pre-trained model from Hugging Face, possibly fine-tuned.

	## Notes

	* The Qwen model components can be large. Ensure you have sufficient disk space and network bandwidth.
	* The `load_model` method in `alm_qwen_hf.py` handles the reconstruction of the composite model.
	* If any errors happen use alm_qwen.py directly
	---