Files changed (1) hide show
  1. README.md +223 -211
README.md CHANGED
@@ -1,211 +1,223 @@
1
- ---
2
- library_name: transformers
3
- tags:
4
- - siglip
5
- - siglip2
6
- - vision
7
- - text
8
- - clip
9
- - multimodal
10
- - image-text-embeddings
11
- - pet-recognition
12
- model_id: AvitoTech/SigLIP2-giant-e5small-v2-gating-for-animal-identification
13
- pipeline_tag: feature-extraction
14
- ---
15
-
16
- # SigLIP2-Giant + E5-Small-v2 + Gating Fine-tuned for Animal Identification
17
-
18
- Fine-tuned multimodal model combining SigLIP2-Giant vision encoder with E5-Small-v2 text encoder for individual animal identification. This advanced architecture uses a learned gating mechanism to dynamically fuse image and text embeddings, specializing in distinguishing between unique cats and dogs. The model produces robust multimodal embeddings optimized for pet recognition, re-identification, and verification tasks.
19
-
20
-
21
- ## Model Details
22
-
23
- - **Base Vision Model**: google/siglip2-giant-opt-patch16-384
24
- - **Text Encoder**: intfloat/e5-small-v2
25
- - **Image Input**: Images (384x384)
26
- - **Text Input**: Variable length text descriptions
27
- - **Final Output**: Fused embeddings (512-dimensional) via learned gating
28
- - **Task**: Individual animal identification and verification with multimodal inputs
29
-
30
- ## Training Data
31
-
32
- The model was trained on a comprehensive dataset combining multiple sources:
33
-
34
- - **[PetFace Dataset](https://arxiv.org/abs/2407.13555)**: Large-scale animal face dataset with 257,484 unique individuals across 13 animal families
35
- - **[Dogs-World](https://www.kaggle.com/datasets/lextoumbourou/dogs-world)**: Kaggle dataset for dog breed and individual identification
36
- - **[LCW (Labeled Cats in the Wild)](https://www.kaggle.com/datasets/dseidli/lcwlabeled-cats-in-the-wild)**: Cat identification dataset
37
- - **Web-scraped Data**: Additional curated images from various sources
38
-
39
- **Total Dataset Statistics:**
40
- - **1,904,157** total photographs
41
- - **695,091** unique individual animals (cats and dogs)
42
-
43
- ## Training Details
44
-
45
- **Training Configuration:**
46
- - **Batch Size**: 116 samples (58 unique identities × 2 photos each)
47
- - **Optimizer**: Adam with learning rate 1e-4
48
- - **Training Duration**: 10 epochs
49
- - **Transfer Learning**: Final 5 transformer blocks unfrozen, lower layers frozen to preserve pre-trained features
50
-
51
- **Loss Function:**
52
- The model is trained using a combined loss function consisting of:
53
- 1. **Triplet Loss** (margin α=0.45): Encourages separation between different animal identities
54
- 2. **Intra-Pair Variance Regularization** (ε=0.01): Promotes consistency across multiple photos of the same animal
55
-
56
- Combined as: L_total = 1.0 × L_triplet + 0.5 × L_var
57
-
58
- This approach creates compact feature clusters for each individual animal while maintaining large separation between different identities. The gating mechanism learns to dynamically balance image and text features for optimal performance.
59
-
60
- ## Performance Metrics
61
-
62
- The model has been benchmarked against various vision encoders on multiple pet recognition datasets:
63
-
64
- ### [Cat Individual Images Dataset](https://www.kaggle.com/datasets/timost1234/cat-individuals)
65
-
66
- | Model | ROC AUC | EER | Top-1 | Top-5 | Top-10 |
67
- |-------|---------|-----|-------|-------|--------|
68
- | CLIP-ViT-Base | 0.9821 | 0.0604 | 0.8359 | 0.9579 | 0.9711 |
69
- | DINOv2-Small | 0.9904 | 0.0422 | 0.8547 | 0.9660 | 0.9764 |
70
- | SigLIP-Base | 0.9899 | 0.0390 | 0.8649 | 0.9757 | 0.9842 |
71
- | SigLIP2-Base | 0.9894 | 0.0388 | 0.8660 | 0.9772 | 0.9863 |
72
- | Zer0int CLIP-L | 0.9881 | 0.0509 | 0.8768 | 0.9767 | 0.9845 |
73
- | SigLIP2-Giant | 0.9940 | 0.0344 | 0.8899 | 0.9868 | 0.9921 |
74
- | **SigLIP2-Giant + E5-Small-v2 + gating** | **0.9929** | **0.0344** | **0.8952** | **0.9872** | **0.9932** |
75
-
76
- ### [DogFaceNet Dataset](https://www.springerprofessional.de/en/a-deep-learning-approach-for-dog-face-verification-and-recogniti/17094782)
77
-
78
- | Model | ROC AUC | EER | Top-1 | Top-5 | Top-10 |
79
- |-------|---------|-----|-------|-------|--------|
80
- | CLIP-ViT-Base | 0.9739 | 0.0772 | 0.4350 | 0.6417 | 0.7204 |
81
- | DINOv2-Small | 0.9829 | 0.0571 | 0.5581 | 0.7540 | 0.8139 |
82
- | SigLIP-Base | 0.9792 | 0.0606 | 0.5848 | 0.7746 | 0.8319 |
83
- | SigLIP2-Base | 0.9776 | 0.0672 | 0.5925 | 0.7856 | 0.8422 |
84
- | Zer0int CLIP-L | 0.9814 | 0.0625 | 0.6289 | 0.8092 | 0.8597 |
85
- | SigLIP2-Giant | 0.9926 | 0.0326 | 0.7475 | 0.9009 | 0.9316 |
86
- | **SigLIP2-Giant + E5-Small-v2 + gating** | **0.9920** | **0.0314** | **0.7818** | **0.9233** | **0.9482** |
87
-
88
- ### Combined Test Dataset (Overall Performance)
89
-
90
- | Model | ROC AUC | EER | Top-1 | Top-5 | Top-10 |
91
- |-------|---------|-----|-------|-------|--------|
92
- | CLIP-ViT-Base | 0.9752 | 0.0729 | 0.6511 | 0.8122 | 0.8555 |
93
- | DINOv2-Small | 0.9848 | 0.0546 | 0.7180 | 0.8678 | 0.9009 |
94
- | SigLIP-Base | 0.9811 | 0.0572 | 0.7359 | 0.8831 | 0.9140 |
95
- | SigLIP2-Base | 0.9793 | 0.0631 | 0.7400 | 0.8889 | 0.9197 |
96
- | Zer0int CLIP-L | 0.9842 | 0.0565 | 0.7626 | 0.8994 | 0.9267 |
97
- | SigLIP2-Giant | 0.9912 | 0.0378 | 0.8243 | 0.9471 | 0.9641 |
98
- | **SigLIP2-Giant + E5-Small-v2 + gating** | **0.9882** | **0.0422** | **0.8428** | **0.9576** | **0.9722** |
99
-
100
- **Metrics Explanation:**
101
- - **ROC AUC**: Area Under the Receiver Operating Characteristic Curve - measures the model's ability to distinguish between different individuals
102
- - **EER**: Equal Error Rate - the error rate where false acceptance and false rejection rates are equal
103
- - **Top-K**: Accuracy of correct identification within the top K predictions
104
-
105
- **Note:** This multimodal model achieves the best overall Top-K accuracy scores by leveraging both visual and textual information through a learned gating mechanism.
106
-
107
- ## Basic Usage
108
-
109
- ### Installation
110
-
111
- ```bash
112
- pip install transformers torch pillow safetensors huggingface_hub
113
- ```
114
-
115
- ### Load Model and Get Embedding
116
-
117
- ```python
118
- import torch
119
- import torch.nn as nn
120
- import torch.nn.functional as F
121
- from PIL import Image
122
- from transformers import SiglipModel, SiglipProcessor, AutoModel, AutoTokenizer
123
- from safetensors.torch import load_file
124
- from huggingface_hub import hf_hub_download
125
-
126
- # Define the model architecture
127
- class FaceRecognizer(nn.Module):
128
- def __init__(self, embedding_dim=512):
129
- super().__init__()
130
- ckpt = "google/siglip2-giant-opt-patch16-384"
131
- self.clip = SiglipModel.from_pretrained(ckpt)
132
- self.processor = SiglipProcessor.from_pretrained(ckpt)
133
-
134
- text_model_name = "intfloat/e5-small-v2"
135
- self.text_encoder = AutoModel.from_pretrained(text_model_name)
136
- self.tokenizer = AutoTokenizer.from_pretrained(text_model_name)
137
-
138
- img_dim = self.clip.config.vision_config.hidden_size
139
- text_dim = self.text_encoder.config.hidden_size
140
-
141
- self.proj_img = nn.Linear(img_dim, embedding_dim)
142
- self.proj_text = nn.Linear(text_dim, embedding_dim)
143
-
144
- self.gate = nn.Sequential(
145
- nn.Linear(embedding_dim * 2, 128),
146
- nn.ReLU(),
147
- nn.Linear(128, 2),
148
- nn.Softmax(dim=-1)
149
- )
150
-
151
- def average_pool(self, last_hidden_states, attention_mask):
152
- last_hidden = last_hidden_states.masked_fill(~attention_mask[..., None].bool(), 0.0)
153
- return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
154
-
155
- def forward(self, images, texts):
156
- device = next(self.parameters()).device
157
-
158
- clip_inputs = self.processor(images=images, return_tensors="pt").to(device)
159
- img_emb = self.clip.get_image_features(**clip_inputs)
160
-
161
- text_inputs = self.tokenizer(
162
- texts, padding=True, truncation=True, max_length=512, return_tensors="pt"
163
- ).to(device)
164
- text_outputs = self.text_encoder(**text_inputs)
165
- text_emb = self.average_pool(text_outputs.last_hidden_state, text_inputs['attention_mask'])
166
-
167
- img_proj = self.proj_img(img_emb)
168
- text_proj = self.proj_text(text_emb)
169
-
170
- fused = torch.cat([text_proj, img_proj], dim=-1)
171
- w = self.gate(fused)
172
- fused_emb = w[:, 0:1] * text_proj + w[:, 1:2] * img_proj
173
-
174
- return F.normalize(fused_emb, dim=1)
175
-
176
- # Load model
177
- model = FaceRecognizer()
178
-
179
- # Download and load weights from HuggingFace
180
- weights_path = hf_hub_download(repo_id="AvitoTech/SigLIP2-giant-e5small-v2-gating-for-animal-identification", filename="model.safetensors")
181
- state_dict = load_file(weights_path)
182
- model.load_state_dict(state_dict)
183
-
184
- device = "cuda" if torch.cuda.is_available() else "cpu"
185
- model = model.to(device).eval()
186
-
187
- # Get fused embedding
188
- image = Image.open("your_image.jpg").convert("RGB")
189
- text = "orange cat"
190
-
191
- with torch.no_grad():
192
- embedding = model([image], [text])
193
-
194
- print(f"Embedding shape: {embedding.shape}") # torch.Size([1, 512])
195
- ```
196
-
197
- ## Citation
198
-
199
- If you use this model in your research or applications, please cite our work:
200
-
201
- ```
202
- BibTeX citation will be added upon paper publication.
203
- ```
204
-
205
- ## Use Cases
206
-
207
- - Individual pet identification and re-identification with multimodal queries
208
- - Lost and found pet matching systems with text descriptions
209
- - Veterinary record management with combined image and text search
210
- - Animal behavior monitoring with contextual information
211
- - Wildlife conservation and tracking with metadata integration
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - siglip
5
+ - siglip2
6
+ - vision
7
+ - text
8
+ - clip
9
+ - multimodal
10
+ - image-text-embeddings
11
+ - pet-recognition
12
+ model_id: AvitoTech/SigLIP2-giant-e5small-v2-gating-for-animal-identification
13
+ pipeline_tag: feature-extraction
14
+ ---
15
+
16
+ # SigLIP2-Giant + E5-Small-v2 + Gating Fine-tuned for Animal Identification
17
+
18
+ Fine-tuned multimodal model combining SigLIP2-Giant vision encoder with E5-Small-v2 text encoder for individual animal identification. This advanced architecture uses a learned gating mechanism to dynamically fuse image and text embeddings, specializing in distinguishing between unique cats and dogs. The model produces robust multimodal embeddings optimized for pet recognition, re-identification, and verification tasks.
19
+
20
+
21
+ ## Model Details
22
+
23
+ - **Base Vision Model**: google/siglip2-giant-opt-patch16-384
24
+ - **Text Encoder**: intfloat/e5-small-v2
25
+ - **Image Input**: Images (384x384)
26
+ - **Text Input**: Variable length text descriptions
27
+ - **Final Output**: Fused embeddings (512-dimensional) via learned gating
28
+ - **Task**: Individual animal identification and verification with multimodal inputs
29
+
30
+ ## Training Data
31
+
32
+ The model was trained on a comprehensive dataset combining multiple sources:
33
+
34
+ - **[PetFace Dataset](https://arxiv.org/abs/2407.13555)**: Large-scale animal face dataset with 257,484 unique individuals across 13 animal families
35
+ - **[Dogs-World](https://www.kaggle.com/datasets/lextoumbourou/dogs-world)**: Kaggle dataset for dog breed and individual identification
36
+ - **[LCW (Labeled Cats in the Wild)](https://www.kaggle.com/datasets/dseidli/lcwlabeled-cats-in-the-wild)**: Cat identification dataset
37
+ - **Web-scraped Data**: Additional curated images from various sources
38
+
39
+ **Total Dataset Statistics:**
40
+ - **1,904,157** total photographs
41
+ - **695,091** unique individual animals (cats and dogs)
42
+
43
+ ## Training Details
44
+
45
+ **Training Configuration:**
46
+ - **Batch Size**: 116 samples (58 unique identities × 2 photos each)
47
+ - **Optimizer**: Adam with learning rate 1e-4
48
+ - **Training Duration**: 10 epochs
49
+ - **Transfer Learning**: Final 5 transformer blocks unfrozen, lower layers frozen to preserve pre-trained features
50
+
51
+ **Loss Function:**
52
+ The model is trained using a combined loss function consisting of:
53
+ 1. **Triplet Loss** (margin α=0.45): Encourages separation between different animal identities
54
+ 2. **Intra-Pair Variance Regularization** (ε=0.01): Promotes consistency across multiple photos of the same animal
55
+
56
+ Combined as: L_total = 1.0 × L_triplet + 0.5 × L_var
57
+
58
+ This approach creates compact feature clusters for each individual animal while maintaining large separation between different identities. The gating mechanism learns to dynamically balance image and text features for optimal performance.
59
+
60
+ ## Performance Metrics
61
+
62
+ The model has been benchmarked against various vision encoders on multiple pet recognition datasets:
63
+
64
+ ### [Cat Individual Images Dataset](https://www.kaggle.com/datasets/timost1234/cat-individuals)
65
+
66
+ | Model | ROC AUC | EER | Top-1 | Top-5 | Top-10 |
67
+ |-------|---------|-----|-------|-------|--------|
68
+ | CLIP-ViT-Base | 0.9821 | 0.0604 | 0.8359 | 0.9579 | 0.9711 |
69
+ | DINOv2-Small | 0.9904 | 0.0422 | 0.8547 | 0.9660 | 0.9764 |
70
+ | SigLIP-Base | 0.9899 | 0.0390 | 0.8649 | 0.9757 | 0.9842 |
71
+ | SigLIP2-Base | 0.9894 | 0.0388 | 0.8660 | 0.9772 | 0.9863 |
72
+ | Zer0int CLIP-L | 0.9881 | 0.0509 | 0.8768 | 0.9767 | 0.9845 |
73
+ | SigLIP2-Giant | 0.9940 | 0.0344 | 0.8899 | 0.9868 | 0.9921 |
74
+ | **SigLIP2-Giant + E5-Small-v2 + gating** | **0.9929** | **0.0344** | **0.8952** | **0.9872** | **0.9932** |
75
+
76
+ ### [DogFaceNet Dataset](https://www.springerprofessional.de/en/a-deep-learning-approach-for-dog-face-verification-and-recogniti/17094782)
77
+
78
+ | Model | ROC AUC | EER | Top-1 | Top-5 | Top-10 |
79
+ |-------|---------|-----|-------|-------|--------|
80
+ | CLIP-ViT-Base | 0.9739 | 0.0772 | 0.4350 | 0.6417 | 0.7204 |
81
+ | DINOv2-Small | 0.9829 | 0.0571 | 0.5581 | 0.7540 | 0.8139 |
82
+ | SigLIP-Base | 0.9792 | 0.0606 | 0.5848 | 0.7746 | 0.8319 |
83
+ | SigLIP2-Base | 0.9776 | 0.0672 | 0.5925 | 0.7856 | 0.8422 |
84
+ | Zer0int CLIP-L | 0.9814 | 0.0625 | 0.6289 | 0.8092 | 0.8597 |
85
+ | SigLIP2-Giant | 0.9926 | 0.0326 | 0.7475 | 0.9009 | 0.9316 |
86
+ | **SigLIP2-Giant + E5-Small-v2 + gating** | **0.9920** | **0.0314** | **0.7818** | **0.9233** | **0.9482** |
87
+
88
+ ### Combined Test Dataset (Overall Performance)
89
+
90
+ | Model | ROC AUC | EER | Top-1 | Top-5 | Top-10 |
91
+ |-------|---------|-----|-------|-------|--------|
92
+ | CLIP-ViT-Base | 0.9752 | 0.0729 | 0.6511 | 0.8122 | 0.8555 |
93
+ | DINOv2-Small | 0.9848 | 0.0546 | 0.7180 | 0.8678 | 0.9009 |
94
+ | SigLIP-Base | 0.9811 | 0.0572 | 0.7359 | 0.8831 | 0.9140 |
95
+ | SigLIP2-Base | 0.9793 | 0.0631 | 0.7400 | 0.8889 | 0.9197 |
96
+ | Zer0int CLIP-L | 0.9842 | 0.0565 | 0.7626 | 0.8994 | 0.9267 |
97
+ | SigLIP2-Giant | 0.9912 | 0.0378 | 0.8243 | 0.9471 | 0.9641 |
98
+ | **SigLIP2-Giant + E5-Small-v2 + gating** | **0.9882** | **0.0422** | **0.8428** | **0.9576** | **0.9722** |
99
+
100
+ **Metrics Explanation:**
101
+ - **ROC AUC**: Area Under the Receiver Operating Characteristic Curve - measures the model's ability to distinguish between different individuals
102
+ - **EER**: Equal Error Rate - the error rate where false acceptance and false rejection rates are equal
103
+ - **Top-K**: Accuracy of correct identification within the top K predictions
104
+
105
+ **Note:** This multimodal model achieves the best overall Top-K accuracy scores by leveraging both visual and textual information through a learned gating mechanism.
106
+
107
+ ## Basic Usage
108
+
109
+ ### Installation
110
+
111
+ ```bash
112
+ pip install transformers torch pillow safetensors huggingface_hub
113
+ ```
114
+
115
+ ### Load Model and Get Embedding
116
+
117
+ ```python
118
+ import torch
119
+ import torch.nn as nn
120
+ import torch.nn.functional as F
121
+ from PIL import Image
122
+ from transformers import SiglipModel, SiglipProcessor, AutoModel, AutoTokenizer
123
+ from safetensors.torch import load_file
124
+ from huggingface_hub import hf_hub_download
125
+
126
+ # Define the model architecture
127
+ class FaceRecognizer(nn.Module):
128
+ def __init__(self, embedding_dim=512):
129
+ super().__init__()
130
+ ckpt = "google/siglip2-giant-opt-patch16-384"
131
+ self.clip = SiglipModel.from_pretrained(ckpt)
132
+ self.processor = SiglipProcessor.from_pretrained(ckpt)
133
+
134
+ text_model_name = "intfloat/e5-small-v2"
135
+ self.text_encoder = AutoModel.from_pretrained(text_model_name)
136
+ self.tokenizer = AutoTokenizer.from_pretrained(text_model_name)
137
+
138
+ img_dim = self.clip.config.vision_config.hidden_size
139
+ text_dim = self.text_encoder.config.hidden_size
140
+
141
+ self.proj_img = nn.Linear(img_dim, embedding_dim)
142
+ self.proj_text = nn.Linear(text_dim, embedding_dim)
143
+
144
+ self.gate = nn.Sequential(
145
+ nn.Linear(embedding_dim * 2, 128),
146
+ nn.ReLU(),
147
+ nn.Linear(128, 2),
148
+ nn.Softmax(dim=-1)
149
+ )
150
+
151
+ def average_pool(self, last_hidden_states, attention_mask):
152
+ last_hidden = last_hidden_states.masked_fill(~attention_mask[..., None].bool(), 0.0)
153
+ return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
154
+
155
+ def forward(self, images, texts):
156
+ device = next(self.parameters()).device
157
+
158
+ clip_inputs = self.processor(images=images, return_tensors="pt").to(device)
159
+ img_emb = self.clip.get_image_features(**clip_inputs)
160
+
161
+ text_inputs = self.tokenizer(
162
+ texts, padding=True, truncation=True, max_length=512, return_tensors="pt"
163
+ ).to(device)
164
+ text_outputs = self.text_encoder(**text_inputs)
165
+ text_emb = self.average_pool(text_outputs.last_hidden_state, text_inputs['attention_mask'])
166
+
167
+ img_proj = self.proj_img(img_emb)
168
+ text_proj = self.proj_text(text_emb)
169
+
170
+ fused = torch.cat([text_proj, img_proj], dim=-1)
171
+ w = self.gate(fused)
172
+ fused_emb = w[:, 0:1] * text_proj + w[:, 1:2] * img_proj
173
+
174
+ return F.normalize(fused_emb, dim=1)
175
+
176
+ # Load model
177
+ model = FaceRecognizer()
178
+
179
+ # Download and load weights from HuggingFace
180
+ weights_path = hf_hub_download(repo_id="AvitoTech/SigLIP2-giant-e5small-v2-gating-for-animal-identification", filename="model.safetensors")
181
+ state_dict = load_file(weights_path)
182
+ model.load_state_dict(state_dict)
183
+
184
+ device = "cuda" if torch.cuda.is_available() else "cpu"
185
+ model = model.to(device).eval()
186
+
187
+ # Get fused embedding
188
+ image = Image.open("your_image.jpg").convert("RGB")
189
+ text = "orange cat"
190
+
191
+ with torch.no_grad():
192
+ embedding = model([image], [text])
193
+
194
+ print(f"Embedding shape: {embedding.shape}") # torch.Size([1, 512])
195
+ ```
196
+
197
+ ## Citation
198
+
199
+ If you use this model in your research or applications, please cite our work:
200
+
201
+ ```
202
+ @Article{jimaging12010030,
203
+ AUTHOR = {Kudryavtsev, Vasiliy and Borodin, Kirill and Berezin, German and Bubenchikov, Kirill and Mkrtchian, Grach and Ryzhkov, Alexander},
204
+ TITLE = {From Visual to Multimodal: Systematic Ablation of Encoders and Fusion Strategies in Animal Identification},
205
+ JOURNAL = {Journal of Imaging},
206
+ VOLUME = {12},
207
+ YEAR = {2026},
208
+ NUMBER = {1},
209
+ ARTICLE-NUMBER = {30},
210
+ URL = {https://www.mdpi.com/2313-433X/12/1/30},
211
+ ISSN = {2313-433X},
212
+ ABSTRACT = {Automated animal identification is a practical task for reuniting lost pets with their owners, yet current systems often struggle due to limited dataset scale and reliance on unimodal visual cues. This study introduces a multimodal verification framework that enhances visual features with semantic identity priors derived from synthetic textual descriptions. We constructed a massive training corpus of 1.9 million photographs covering 695,091 unique animals to support this investigation. Through systematic ablation studies, we identified SigLIP2-Giant and E5-Small-v2 as the optimal vision and text backbones. We further evaluated fusion strategies ranging from simple concatenation to adaptive gating to determine the best method for integrating these modalities. Our proposed approach utilizes a gated fusion mechanism and achieved a Top-1 accuracy of 84.28% and an Equal Error Rate of 0.0422 on a comprehensive test protocol. These results represent an 11% improvement over leading unimodal baselines and demonstrate that integrating synthesized semantic descriptions significantly refines decision boundaries in large-scale pet re-identification.},
213
+ DOI = {10.3390/jimaging12010030}
214
+ }
215
+ ```
216
+
217
+ ## Use Cases
218
+
219
+ - Individual pet identification and re-identification with multimodal queries
220
+ - Lost and found pet matching systems with text descriptions
221
+ - Veterinary record management with combined image and text search
222
+ - Animal behavior monitoring with contextual information
223
+ - Wildlife conservation and tracking with metadata integration