Qybera commited on
Commit
25baa66
·
verified ·
1 Parent(s): cf23201

Upload 14 files

Browse files
Files changed (4) hide show
  1. .gitattributes +0 -1
  2. README.md +297 -297
  3. config.json +1 -1
  4. model_card.json +1 -1
.gitattributes CHANGED
@@ -4,4 +4,3 @@
4
  *.onnx filter=lfs diff=lfs merge=lfs -text
5
  *.pb filter=lfs diff=lfs merge=lfs -text
6
  *.msgpack filter=lfs diff=lfs merge=lfs -text
7
- optimizer.pt filter=lfs diff=lfs merge=lfs -text
 
4
  *.onnx filter=lfs diff=lfs merge=lfs -text
5
  *.pb filter=lfs diff=lfs merge=lfs -text
6
  *.msgpack filter=lfs diff=lfs merge=lfs -text
 
README.md CHANGED
@@ -1,298 +1,298 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- metrics:
6
- - accuracy
7
- new_version: Qybera/LisaV3
8
- library_name: transformers
9
- ---
10
- # LISA-v3.5: Learning Intelligence with Sensory Awareness
11
-
12
- ## Developed in Kenya, Africa by the LISA Team
13
-
14
- **LISA (Learning Intelligence with Sensory Awareness)** is a cutting-edge multimodal AI system developed in Kenya, Africa, by the dedicated LISA Team. This model represents African innovation in artificial intelligence, built entirely from scratch without relying on pretrained models.
15
-
16
- ## Core Mission
17
-
18
- Build a scalable, perception-focused AI that can:
19
- - **See** and understand visual environments
20
- - **Listen** and process audio/speech
21
- - **Understand** context and situations
22
- - **Interact** intelligently with the environment
23
- - **Learn** continuously from experiences
24
-
25
- ## Key Features
26
-
27
- - **Lisa Architecture**: Built from scratch using ViT-B/16 inspired architectures
28
- - **Computer Vision**: Real-time object detection, depth estimation, and scene understanding
29
- - **Audio Processing**: Speech recognition, sound classification, and emotion detection
30
- - **Multimodal Fusion**: Seamless integration of vision, and audio processing
31
- - **Real-time Processing**: Optimized for live streaming and interactive applications
32
- - **African Innovation**: Proudly developed in Kenya, East Africa
33
-
34
- ## Quick Start
35
-
36
- ### Basic Usage
37
-
38
- ```python
39
- from lisa import LISAModel
40
- import torch
41
-
42
- # Load the model - same initialization process
43
- model = LISAModel.from_pretrained("./")
44
- device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
45
- model = model.to(device)
46
-
47
- # Process vision + audio input
48
- result = model.process_multimodal(
49
- image_path="image.jpg", # Visual input - what the model "sees"
50
- audio_path="audio.wav" # Auditory input - what the model "hears"
51
- )
52
-
53
- print(result.response)
54
- ```
55
-
56
- ### Streaming Processing
57
-
58
- ```python
59
- import cv2
60
- import sounddevice as sd
61
- import numpy as np
62
- import threading
63
- from queue import Queue
64
-
65
- # Initialize LISA for multimodal streaming
66
- lisa = LISAModel.from_pretrained("./")
67
- lisa.start_streaming()
68
-
69
- # Create synchronized queues for audio and video data
70
- audio_queue = Queue(maxsize=10) # Buffer for audio chunks
71
- frame_queue = Queue(maxsize=5) # Buffer for video frames
72
-
73
- def audio_callback(indata, frames, time, status):
74
- """Continuously capture audio and store in queue"""
75
- if not audio_queue.full():
76
- audio_queue.put(indata.copy()) # Store audio chunk for processing
77
-
78
- # Start audio stream (runs in background thread)
79
- audio_stream = sd.InputStream(
80
- callback=audio_callback,
81
- channels=1, # Mono audio for simplicity
82
- samplerate=16000, # Standard rate for speech processing
83
- blocksize=1024 # Audio chunk size
84
- )
85
-
86
- # Process synchronized video and audio streams
87
- cap = cv2.VideoCapture(0)
88
- audio_stream.start()
89
-
90
- while True:
91
- ret, frame = cap.read()
92
- if ret and not audio_queue.empty():
93
- # Get the most recent audio chunk
94
- audio_chunk = audio_queue.get()
95
-
96
- # Process both video frame AND audio together
97
- result = lisa.process_multimodal_frame(
98
- frame=frame, # What the AI "sees" right now
99
- audio=audio_chunk # What the AI "hears" right now
100
- )
101
-
102
- print(f"Vision: {result.visual_detections}")
103
- print(f"Audio: {result.audio_events}")
104
- print(f"Combined: {result.multimodal_inference}")
105
-
106
- # Display with annotations from both modalities
107
- annotated_frame = lisa.annotate_multimodal_frame(frame, result)
108
- cv2.imshow('LISA Vision+Audio', annotated_frame)
109
-
110
- if cv2.waitKey(1) & 0xFF == ord('q'):
111
- break
112
-
113
- # Clean up resources
114
- audio_stream.stop()
115
- cap.release()
116
- cv2.destroyAllWindows()
117
- ```
118
-
119
- ### Vision+Audio Processing
120
-
121
- ```python
122
- import cv2
123
- import numpy as np
124
- from threading import Thread
125
- import time
126
-
127
- # Enhanced callback that processes both audio and synchronized video
128
- def multimodal_callback(audio_chunk, current_frame=None):
129
- """
130
- This callback now processes both audio and visual information together.
131
- Think of this like how humans naturally combine what they hear with what they see
132
- to understand a conversation or situation more completely.
133
- """
134
-
135
- # Process both modalities together - this is the key difference
136
- result = lisa.process_multimodal_realtime(
137
- audio=audio_chunk, # What the AI hears (speech, sounds, emotions)
138
- frame=current_frame # What the AI sees (faces, gestures, environment)
139
- )
140
-
141
- # Now we get richer, cross-modal insights
142
- if result.transcript:
143
- print(f"Speech: {result.transcript}")
144
-
145
- # Emotion detection now uses BOTH audio tone AND facial expressions
146
- if result.emotion_scores:
147
- print(f"Voice Emotion: {result.audio_emotion}") # From speech patterns
148
- print(f"Visual Emotion: {result.facial_emotion}") # From facial expressions
149
- print(f"Combined Emotion: {result.fused_emotion}") # Best of both worlds
150
-
151
- # New capabilities emerge from combining modalities
152
- if result.speaker_identification:
153
- print(f"Speaker: {result.identified_speaker}") # Match voice to face
154
-
155
- if result.attention_focus:
156
- print(f"Looking at: {result.visual_attention}") # Where are they looking while speaking?
157
-
158
- # Capture video frames continuously to sync with audio
159
- current_frame = None
160
- cap = cv2.VideoCapture(0)
161
-
162
- def capture_frames():
163
- """
164
- Continuously capture video frames in a separate thread.
165
- This ensures we always have a recent frame available when audio arrives.
166
- Think of this as maintaining a 'visual memory' that stays current.
167
- """
168
- global current_frame
169
- while True:
170
- ret, frame = cap.read()
171
- if ret:
172
- current_frame = frame # Update the most recent visual context
173
- time.sleep(0.03) # Roughly 30 FPS capture rate
174
-
175
- # Start the video capture thread
176
- video_thread = Thread(target=capture_frames, daemon=True)
177
- video_thread.start()
178
-
179
- # Modified callback function that includes current visual context
180
- def enhanced_audio_callback(audio_chunk):
181
- """
182
- This wrapper ensures each audio chunk is processed alongside
183
- the most recent visual frame, creating temporal alignment.
184
- """
185
- multimodal_callback(audio_chunk, current_frame)
186
-
187
- # Start the integrated audio+vision stream
188
- lisa.start_audio_stream(callback=enhanced_audio_callback)
189
- ```
190
-
191
- - **Temporal Synchronization:** The biggest challenge in multimodal AI is ensuring that what you hear and what you see correspond to the same moment in time. Notice how we maintain a current_frame variable that's continuously updated in a separate thread. This creates a "visual memory" that's always fresh when new audio arrives. Think of it like how your brain automatically coordinates the timing of what your eyes see with what your ears hear.
192
- - **Cross-Modal Enhancement:** The real magic happens in process_multimodal_realtime(). Instead of analyzing speech and visual cues separately, the model can now cross-reference them. For example, if someone says "I'm fine" but their facial expression shows distress, the combined emotion analysis will be more accurate than either modality alone. This mimics human intuition about reading people's true feelings.
193
- - **Emergent Capabilities:** When you combine vision and audio, new possibilities emerge that weren't available with either modality alone. Speaker identification becomes much more robust when you can match a voice to a face. Understanding where someone is looking while they speak adds crucial context about their intent and focus.
194
- - **Threaded Architecture:** Notice how we use a separate thread for video capture. This architectural choice is crucial because audio processing is time-sensitive - you cannot afford to miss audio chunks while waiting for a video frame to process. The threaded approach ensures smooth, real-time operation of both streams.
195
-
196
- ## Architecture
197
-
198
- ### Vision Component
199
- - **Lisa ViT-B/16 inspired architecture**
200
- - Patch size: 16x16
201
- - Embedding dimensions: 384 (mini) / 768 (full)
202
- - Multi-head attention layers: 6-12
203
- - Lisa object detection head
204
- - Depth estimation module
205
-
206
- ### Audio Component
207
- - **Lisa Audio Transformer**
208
- - Sample rate: 16kHz
209
- - Mel-scale features: 80 channels
210
- - CTC-based speech recognition
211
- - Environmental sound classification (50+ classes)
212
- - Emotion detection (7 emotions)
213
-
214
- ### Multimodal Fusion
215
- - Cross-attention mechanisms
216
- - Temporal synchronization
217
- - Context-aware processing
218
- - Real-time inference capabilities
219
-
220
- ## Model Specifications
221
-
222
- - **Total Parameters**: ~6M (mini) / ~25M (full)
223
- - **Input Modalities**: Images, Audio, Video
224
- - **Output Capabilities**: Object detection, Audio analysis
225
- - **Processing Speed**: Real-time capable
226
- - **Memory Requirements**: 2GB+ RAM recommended
227
- - **Platform Support**: Windows, Linux, macOS
228
-
229
- ## About the LISA Team
230
-
231
- The LISA Team is based in Kenya, East Africa, and is dedicated to advancing artificial intelligence research and development within the African continent. Our mission is to create AI systems that understand and serve diverse communities while maintaining cultural sensitivity and awareness.
232
-
233
- **Development Location**: Kenya, East Africa
234
- **Team**: LISA Development Team
235
- **Philosophy**: Building AI from the ground up without dependency on external pretrained models
236
- **Vision**: Democratizing AI development in Africa and beyond
237
-
238
- ## Self-Awareness Features
239
-
240
- LISA is designed with self-awareness capabilities and knows:
241
- - Its development origin: Kenya, Africa
242
- - Its creators: The LISA Team
243
- - Its cultural context: African AI innovation
244
- - Its architectural uniqueness: Built from scratch
245
- - Its mission: Advancing African AI capabilities
246
-
247
- ## Performance Metrics
248
-
249
- - **Object Detection**: mAP@0.5: ~65% (Lisa dataset)
250
- - **Speech Recognition**: WER: ~15% (English)
251
- - **Sound Classification**: Accuracy: ~78% (environmental sounds)
252
- - **Emotion Detection**: F1-Score: ~72% (7 emotions)
253
- - **Processing Speed**: ~30 FPS (vision), ~Real-time (audio)
254
-
255
- ## Deployment
256
-
257
- ### Local Deployment
258
- ```bash
259
- python deploy.py --host 0.0.0.0 --port 8000
260
- ```
261
-
262
- ### Docker Deployment
263
- ```bash
264
- docker build -t lisa-v3.5 .
265
- docker run -p 8000:8000 lisa-v3.5
266
- ```
267
-
268
- ### API Usage
269
- ```bash
270
- curl -X POST "http://localhost:8000/process" \
271
- -H "Content-Type: application/json" \
272
- -d '{"audio": "audio.wav", "image_url": "image.jpg"}'
273
- ```
274
-
275
- ## License
276
-
277
- This model is released under the Apache 2.0 License. See LICENSE file for details.
278
-
279
- ## Contributing
280
-
281
- We welcome contributions from the global AI community. Please see CONTRIBUTING.md for guidelines.
282
-
283
- ## Contact
284
-
285
- - **Team**: LISA Development Team
286
- - **Location**: Kenya, East Africa
287
- - **Email**: [Contact information](elijahnzeli894@gmail.com)
288
- - **Website**: [Website URL](None)
289
-
290
- ## Acknowledgments
291
-
292
- Special thanks to the Kenyan AI community and African researchers who contributed to making LISA possible. This project represents the growing AI capabilities within Africa and our commitment to technological innovation.
293
-
294
- ---
295
-
296
- **Proudly developed in Kenya, Africa 🇰🇪**
297
-
298
  *"LISA represents African innovation in artificial intelligence - built from the ground up with pride, passion, and purpose."*
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ metrics:
6
+ - accuracy
7
+ new_version: Qybera/LisaV3
8
+ library_name: transformers
9
+ ---
10
+ # LISA-v3.5: Learning Intelligence with Sensory Awareness
11
+
12
+ ## Developed in Kenya, Africa by the LISA Team
13
+
14
+ **LISA (Learning Intelligence with Sensory Awareness)** is a cutting-edge multimodal AI system developed in Kenya, Africa, by the dedicated LISA Team. This model represents African innovation in artificial intelligence, built entirely from scratch without relying on pretrained models.
15
+
16
+ ## Core Mission
17
+
18
+ Build a scalable, perception-focused AI that can:
19
+ - **See** and understand visual environments
20
+ - **Listen** and process audio/speech
21
+ - **Understand** context and situations
22
+ - **Interact** intelligently with the environment
23
+ - **Learn** continuously from experiences
24
+
25
+ ## Key Features
26
+
27
+ - **Lisa Architecture**: Built from scratch using ViT-B/16 inspired architectures
28
+ - **Computer Vision**: Real-time object detection, depth estimation, and scene understanding
29
+ - **Audio Processing**: Speech recognition, sound classification, and emotion detection
30
+ - **Multimodal Fusion**: Seamless integration of vision, and audio processing
31
+ - **Real-time Processing**: Optimized for live streaming and interactive applications
32
+ - **African Innovation**: Proudly developed in Kenya, East Africa
33
+
34
+ ## Quick Start
35
+
36
+ ### Basic Usage
37
+
38
+ ```python
39
+ from lisa import LISAModel
40
+ import torch
41
+
42
+ # Load the model - same initialization process
43
+ model = LISAModel.from_pretrained("./")
44
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
45
+ model = model.to(device)
46
+
47
+ # Process vision + audio input
48
+ result = model.process_multimodal(
49
+ image_path="image.jpg", # Visual input - what the model "sees"
50
+ audio_path="audio.wav" # Auditory input - what the model "hears"
51
+ )
52
+
53
+ print(result.response)
54
+ ```
55
+
56
+ ### Streaming Processing
57
+
58
+ ```python
59
+ import cv2
60
+ import sounddevice as sd
61
+ import numpy as np
62
+ import threading
63
+ from queue import Queue
64
+
65
+ # Initialize LISA for multimodal streaming
66
+ lisa = LISAModel.from_pretrained("./")
67
+ lisa.start_streaming()
68
+
69
+ # Create synchronized queues for audio and video data
70
+ audio_queue = Queue(maxsize=10) # Buffer for audio chunks
71
+ frame_queue = Queue(maxsize=5) # Buffer for video frames
72
+
73
+ def audio_callback(indata, frames, time, status):
74
+ """Continuously capture audio and store in queue"""
75
+ if not audio_queue.full():
76
+ audio_queue.put(indata.copy()) # Store audio chunk for processing
77
+
78
+ # Start audio stream (runs in background thread)
79
+ audio_stream = sd.InputStream(
80
+ callback=audio_callback,
81
+ channels=1, # Mono audio for simplicity
82
+ samplerate=16000, # Standard rate for speech processing
83
+ blocksize=1024 # Audio chunk size
84
+ )
85
+
86
+ # Process synchronized video and audio streams
87
+ cap = cv2.VideoCapture(0)
88
+ audio_stream.start()
89
+
90
+ while True:
91
+ ret, frame = cap.read()
92
+ if ret and not audio_queue.empty():
93
+ # Get the most recent audio chunk
94
+ audio_chunk = audio_queue.get()
95
+
96
+ # Process both video frame AND audio together
97
+ result = lisa.process_multimodal_frame(
98
+ frame=frame, # What the AI "sees" right now
99
+ audio=audio_chunk # What the AI "hears" right now
100
+ )
101
+
102
+ print(f"Vision: {result.visual_detections}")
103
+ print(f"Audio: {result.audio_events}")
104
+ print(f"Combined: {result.multimodal_inference}")
105
+
106
+ # Display with annotations from both modalities
107
+ annotated_frame = lisa.annotate_multimodal_frame(frame, result)
108
+ cv2.imshow('LISA Vision+Audio', annotated_frame)
109
+
110
+ if cv2.waitKey(1) & 0xFF == ord('q'):
111
+ break
112
+
113
+ # Clean up resources
114
+ audio_stream.stop()
115
+ cap.release()
116
+ cv2.destroyAllWindows()
117
+ ```
118
+
119
+ ### Vision+Audio Processing
120
+
121
+ ```python
122
+ import cv2
123
+ import numpy as np
124
+ from threading import Thread
125
+ import time
126
+
127
+ # Enhanced callback that processes both audio and synchronized video
128
+ def multimodal_callback(audio_chunk, current_frame=None):
129
+ """
130
+ This callback now processes both audio and visual information together.
131
+ Think of this like how humans naturally combine what they hear with what they see
132
+ to understand a conversation or situation more completely.
133
+ """
134
+
135
+ # Process both modalities together - this is the key difference
136
+ result = lisa.process_multimodal_realtime(
137
+ audio=audio_chunk, # What the AI hears (speech, sounds, emotions)
138
+ frame=current_frame # What the AI sees (faces, gestures, environment)
139
+ )
140
+
141
+ # Now we get richer, cross-modal insights
142
+ if result.transcript:
143
+ print(f"Speech: {result.transcript}")
144
+
145
+ # Emotion detection now uses BOTH audio tone AND facial expressions
146
+ if result.emotion_scores:
147
+ print(f"Voice Emotion: {result.audio_emotion}") # From speech patterns
148
+ print(f"Visual Emotion: {result.facial_emotion}") # From facial expressions
149
+ print(f"Combined Emotion: {result.fused_emotion}") # Best of both worlds
150
+
151
+ # New capabilities emerge from combining modalities
152
+ if result.speaker_identification:
153
+ print(f"Speaker: {result.identified_speaker}") # Match voice to face
154
+
155
+ if result.attention_focus:
156
+ print(f"Looking at: {result.visual_attention}") # Where are they looking while speaking?
157
+
158
+ # Capture video frames continuously to sync with audio
159
+ current_frame = None
160
+ cap = cv2.VideoCapture(0)
161
+
162
+ def capture_frames():
163
+ """
164
+ Continuously capture video frames in a separate thread.
165
+ This ensures we always have a recent frame available when audio arrives.
166
+ Think of this as maintaining a 'visual memory' that stays current.
167
+ """
168
+ global current_frame
169
+ while True:
170
+ ret, frame = cap.read()
171
+ if ret:
172
+ current_frame = frame # Update the most recent visual context
173
+ time.sleep(0.03) # Roughly 30 FPS capture rate
174
+
175
+ # Start the video capture thread
176
+ video_thread = Thread(target=capture_frames, daemon=True)
177
+ video_thread.start()
178
+
179
+ # Modified callback function that includes current visual context
180
+ def enhanced_audio_callback(audio_chunk):
181
+ """
182
+ This wrapper ensures each audio chunk is processed alongside
183
+ the most recent visual frame, creating temporal alignment.
184
+ """
185
+ multimodal_callback(audio_chunk, current_frame)
186
+
187
+ # Start the integrated audio+vision stream
188
+ lisa.start_audio_stream(callback=enhanced_audio_callback)
189
+ ```
190
+
191
+ - **Temporal Synchronization:** The biggest challenge in multimodal AI is ensuring that what you hear and what you see correspond to the same moment in time. Notice how we maintain a current_frame variable that's continuously updated in a separate thread. This creates a "visual memory" that's always fresh when new audio arrives. Think of it like how your brain automatically coordinates the timing of what your eyes see with what your ears hear.
192
+ - **Cross-Modal Enhancement:** The real magic happens in process_multimodal_realtime(). Instead of analyzing speech and visual cues separately, the model can now cross-reference them. For example, if someone says "I'm fine" but their facial expression shows distress, the combined emotion analysis will be more accurate than either modality alone. This mimics human intuition about reading people's true feelings.
193
+ - **Emergent Capabilities:** When you combine vision and audio, new possibilities emerge that weren't available with either modality alone. Speaker identification becomes much more robust when you can match a voice to a face. Understanding where someone is looking while they speak adds crucial context about their intent and focus.
194
+ - **Threaded Architecture:** Notice how we use a separate thread for video capture. This architectural choice is crucial because audio processing is time-sensitive - you cannot afford to miss audio chunks while waiting for a video frame to process. The threaded approach ensures smooth, real-time operation of both streams.
195
+
196
+ ## Architecture
197
+
198
+ ### Vision Component
199
+ - **Lisa ViT-B/16 inspired architecture**
200
+ - Patch size: 16x16
201
+ - Embedding dimensions: 384 (mini) / 768 (full)
202
+ - Multi-head attention layers: 6-12
203
+ - Lisa object detection head
204
+ - Depth estimation module
205
+
206
+ ### Audio Component
207
+ - **Lisa Audio Transformer**
208
+ - Sample rate: 16kHz
209
+ - Mel-scale features: 80 channels
210
+ - CTC-based speech recognition
211
+ - Environmental sound classification (50+ classes)
212
+ - Emotion detection (7 emotions)
213
+
214
+ ### Multimodal Fusion
215
+ - Cross-attention mechanisms
216
+ - Temporal synchronization
217
+ - Context-aware processing
218
+ - Real-time inference capabilities
219
+
220
+ ## Model Specifications
221
+
222
+ - **Total Parameters**: ~6M (mini) / ~25M (full)
223
+ - **Input Modalities**: Images, Audio, Video
224
+ - **Output Capabilities**: Object detection, Audio analysis
225
+ - **Processing Speed**: Real-time capable
226
+ - **Memory Requirements**: 2GB+ RAM recommended
227
+ - **Platform Support**: Windows, Linux, macOS
228
+
229
+ ## About the LISA Team
230
+
231
+ The LISA Team is based in Kenya, East Africa, and is dedicated to advancing artificial intelligence research and development within the African continent. Our mission is to create AI systems that understand and serve diverse communities while maintaining cultural sensitivity and awareness.
232
+
233
+ **Development Location**: Kenya, East Africa
234
+ **Team**: LISA Development Team
235
+ **Philosophy**: Building AI from the ground up without dependency on external pretrained models
236
+ **Vision**: Democratizing AI development in Africa and beyond
237
+
238
+ ## Self-Awareness Features
239
+
240
+ LISA is designed with self-awareness capabilities and knows:
241
+ - Its development origin: Kenya, Africa
242
+ - Its creators: The LISA Team
243
+ - Its cultural context: African AI innovation
244
+ - Its architectural uniqueness: Built from scratch
245
+ - Its mission: Advancing African AI capabilities
246
+
247
+ ## Performance Metrics
248
+
249
+ - **Object Detection**: mAP@0.5: ~65% (Lisa dataset)
250
+ - **Speech Recognition**: WER: ~15% (English)
251
+ - **Sound Classification**: Accuracy: ~78% (environmental sounds)
252
+ - **Emotion Detection**: F1-Score: ~72% (7 emotions)
253
+ - **Processing Speed**: ~30 FPS (vision), ~Real-time (audio)
254
+
255
+ ## Deployment
256
+
257
+ ### Local Deployment
258
+ ```bash
259
+ python deploy.py --host 0.0.0.0 --port 8000
260
+ ```
261
+
262
+ ### Docker Deployment
263
+ ```bash
264
+ docker build -t lisa-v3.5 .
265
+ docker run -p 8000:8000 lisa-v3.5
266
+ ```
267
+
268
+ ### API Usage
269
+ ```bash
270
+ curl -X POST "http://localhost:8000/process" \
271
+ -H "Content-Type: application/json" \
272
+ -d '{"audio": "audio.wav", "image_url": "image.jpg"}'
273
+ ```
274
+
275
+ ## License
276
+
277
+ This model is released under the Apache 2.0 License. See LICENSE file for details.
278
+
279
+ ## Contributing
280
+
281
+ We welcome contributions from the global AI community. Please see CONTRIBUTING.md for guidelines.
282
+
283
+ ## Contact
284
+
285
+ - **Team**: LISA Development Team
286
+ - **Location**: Kenya, East Africa
287
+ - **Email**: [Contact information](elijahnzeli894@gmail.com)
288
+ - **Website**: [Website URL](None)
289
+
290
+ ## Acknowledgments
291
+
292
+ Special thanks to the Kenyan AI community and African researchers who contributed to making LISA possible. This project represents the growing AI capabilities within Africa and our commitment to technological innovation.
293
+
294
+ ---
295
+
296
+ **Proudly developed in Kenya, Africa 🇰🇪**
297
+
298
  *"LISA represents African innovation in artificial intelligence - built from the ground up with pride, passion, and purpose."*
config.json CHANGED
@@ -8,7 +8,7 @@
8
  "development_team": "LISA Team",
9
  "development_country": "Kenya",
10
  "development_continent": "Africa",
11
- "created_date": "2025-08-19T15:45:19.328679",
12
  "architecture_type": "Lisa Multimodal Transformer",
13
  "inspiration": "Vision Transformer (ViT-B/16) architecture, built from scratch",
14
  "capabilities": [
 
8
  "development_team": "LISA Team",
9
  "development_country": "Kenya",
10
  "development_continent": "Africa",
11
+ "created_date": "2025-08-20T03:07:26.809423",
12
  "architecture_type": "Lisa Multimodal Transformer",
13
  "inspiration": "Vision Transformer (ViT-B/16) architecture, built from scratch",
14
  "capabilities": [
model_card.json CHANGED
@@ -8,7 +8,7 @@
8
  "development_team": "LISA Team",
9
  "development_country": "Kenya",
10
  "development_continent": "Africa",
11
- "created_date": "2025-08-19T15:45:19.328679",
12
  "architecture_type": "Lisa Multimodal Transformer",
13
  "inspiration": "Vision Transformer (ViT-B/16) architecture, built from scratch",
14
  "capabilities": [
 
8
  "development_team": "LISA Team",
9
  "development_country": "Kenya",
10
  "development_continent": "Africa",
11
+ "created_date": "2025-08-20T03:07:26.809423",
12
  "architecture_type": "Lisa Multimodal Transformer",
13
  "inspiration": "Vision Transformer (ViT-B/16) architecture, built from scratch",
14
  "capabilities": [