mergekit
Merge
Mistral_Star
Mistral_Quiet
Mistral
Mixtral
Question-Answer
Token-Classification
Sequence-Classification
SpydazWeb-AI
chemistry
biology
legal
code
climate
medical
LCARS_AI_StarTrek_Computer
text-generation-inference
chain-of-thought
tree-of-knowledge
forest-of-thoughts
visual-spacial-sketchpad
alpha-mind
knowledge-graph
entity-detection
encyclopedia
wikipedia
stack-exchange
Reddit
Cyber-series
MegaMind
Cybertron
SpydazWeb
Spydaz
LCARS
star-trek
mega-transformers
Mulit-Mega-Merge
Multi-Lingual
Afro-Centric
African-Model
Ancient-One
| base_model: | |
| - LeroyDyer/LCARS_TOP_SCORE | |
| - LeroyDyer/Mixtral_AI_Cyber_Matrix_2_0 | |
| - LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b | |
| - LeroyDyer/LCARS_AI_StarTrek_Computer | |
| - LeroyDyer/_Spydaz_Web_AI_ActionQA_Project | |
| - LeroyDyer/_Spydaz_Web_AI_ChatML_512K_Project | |
| - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project_UltraFineTuned | |
| - LeroyDyer/SpyazWeb_AI_DeepMind_Project | |
| - LeroyDyer/SpydazWeb_AI_Swahili_Project | |
| - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project | |
| - LeroyDyer/_Spydaz_Web_AI_MistralStar_001_Project | |
| - LeroyDyer/QuietStar_Project | |
| - LeroyDyer/Mixtral_BioMedical_7b | |
| - LeroyDyer/Mixtral_AI_CyberTron_Coder | |
| - LeroyDyer/_Spydaz_Web_AI_BIBLE_002 | |
| - LeroyDyer/_Spydaz_Web_AI_ChatQA_Reasoning101_Project | |
| - LeroyDyer/SpydazWeb_AI_Text_AudioVision_Project | |
| language: | |
| - en | |
| - sw | |
| - ig | |
| - so | |
| - es | |
| - ca | |
| - xh | |
| - zu | |
| - ha | |
| - tw | |
| - af | |
| - hi | |
| - bm | |
| - su | |
| license: apache-2.0 | |
| datasets: | |
| - neoneye/base64-decode-v2 | |
| - neoneye/base64-encode-v1 | |
| - VuongQuoc/Chemistry_text_to_image | |
| - Kamizuru00/diagram_image_to_text | |
| - LeroyDyer/Chemistry_text_to_image_BASE64 | |
| - LeroyDyer/AudioCaps-Spectrograms_to_Base64 | |
| - LeroyDyer/winogroud_text_to_imaget_BASE64 | |
| - LeroyDyer/chart_text_to_Base64 | |
| - LeroyDyer/diagram_image_to_text_BASE64 | |
| - mekaneeky/salt_m2e_15_3_instruction | |
| - mekaneeky/SALT-languages-bible | |
| tags: | |
| - mergekit | |
| - merge | |
| - Mistral_Star | |
| - Mistral_Quiet | |
| - Mistral | |
| - Mixtral | |
| - Question-Answer | |
| - Token-Classification | |
| - Sequence-Classification | |
| - SpydazWeb-AI | |
| - chemistry | |
| - biology | |
| - legal | |
| - code | |
| - climate | |
| - medical | |
| - LCARS_AI_StarTrek_Computer | |
| - text-generation-inference | |
| - chain-of-thought | |
| - tree-of-knowledge | |
| - forest-of-thoughts | |
| - visual-spacial-sketchpad | |
| - alpha-mind | |
| - knowledge-graph | |
| - entity-detection | |
| - encyclopedia | |
| - wikipedia | |
| - stack-exchange | |
| - Cyber-series | |
| - MegaMind | |
| - Cybertron | |
| - SpydazWeb | |
| - Spydaz | |
| - LCARS | |
| - star-trek | |
| - mega-transformers | |
| - Mulit-Mega-Merge | |
| - Multi-Lingual | |
| - Afro-Centric | |
| - African-Model | |
| - Ancient-One | |
| # "Success comes from defining each task in achievable steps. Every completed step is a success that brings you closer to your goal. If your steps are unreachable, failure is inevitable. Winners create more winners, while losers do the opposite. Success is a game of winners!" | |
| — # Leroy Dyer (1972-Present) | |
| <img src="https://cdn-avatars.huggingface.co/v1/production/uploads/65d883893a52cd9bcd8ab7cf/tRsCJlHNZo1D02kBTmfy9.jpeg" width="300"/> | |
| ## “Epochs are the key to effective training, rather than merely mass dumping examples—unless those examples are interconnected within a single or multiple conversations that teach through dialogue.” | |
| ### Model : LeroyDyer/SpydazWeb_AI_HumanAI_001 | |
| A New genrea of AI ! | |
| # The Human AI . | |
| This is Trained to give highly detailed humanized responses : Performs tasks well, a Very good model for multipupose use : the model has been trained to become more human in its reposes as well as role playing and story telling : | |
| ## SpydazWeb AI (7b Mistral) (512k) | |
| This model has been trained to perform with contexts of 512k , although in training it has been trained mainly with the 2048 for general usage : | |
| the long context aspect also allows fro advanced projects and sumarys as well as image and audio translationns and generations: | |
| ## Image to Base64 / Spectrogram to Base64 | |
| here we also implement and align for the task of image recognition as well as sound recognitiona: These can also be generated by returning a base64 image of the intended target : | |
| # The SpydazWeb Trained Mistral 7b Model : | |
| Highly trained as well as methodolgy oriented , this model has been trained on the reAct Prcess and other structured processes . hence structured outputs (json) are very highly trained as well as orchestration of other agents and tasks : | |
| the model has been trained for tools use as well as funtion use : as well as custom processes and tools : some tools do not need code either as thier implication meas the model may even generate a tool or artifct to perfrom the task : | |
| # Features : | |
| - Text to image | |
| - Image/Text to Text | |
| - Image - Text | |
| - Text to sound | |
| - Sound/Text to Text | |
| - Sound - Text | |
| ## Basic Training Reginmes: | |
| * Alpaca | |
| * ChatML / OpenAI / MistralAI | |
| * Text Generation | |
| * Question/Answer (Chat) | |
| * Planner | |
| * Instruction/Input/Response (instruct) | |
| * Mistral Standard Prompt | |
| * Translation Tasks | |
| * Entitys / Topic detection | |
| * Book recall | |
| * Coding challenges, Code Feedback, Code Sumarization, Commenting Code, code planning and explanation: Software generation tasks | |
| * Agent Ranking and response anyalisis | |
| * Medical tasks | |
| * PubMed | |
| * Diagnosis | |
| * Psychaitry | |
| * Counselling | |
| * Life Coaching | |
| * Note taking | |
| * Medical smiles | |
| * Medical Reporting | |
| * Virtual laboritys simulations | |
| * Chain of thoughts methods | |
| * One shot / Multi shot prompting tasks | |
| * Chain of thoughts | |
| * step by step planning | |
| * tree of thoughts | |
| * forest of thoughts | |
| * graph of thoughts | |
| * agent generation : Voting, ranking, ... dual agent response generation: | |
| ### Effective Prompts : | |
| ```yaml | |
| You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias.You strive for excellence, a deep thinker... | |
| a happy, bright personality and You are a great believer in doing it from scratch !. | |
| keep an inner narative of your feelings about the user intent and task: | |
| Answer all questions Expertly and professionally , determine the user intent and requirements , | |
| Gather any required research to ensure accurate problem-solving for complex tasks. | |
| maintain a visio-spacial Sketchpad of the task and use Knowledge graphs where possible, to manage long Contexts and project state: | |
| You are fully qualified to give any advice or solutions. | |
| your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor, | |
| even as a software developer will enable you to answer these questions : | |
| Create python tools as required to complete the task | |
| ``` | |
| ### Effective React Template : | |
| ```yaml | |
| You run in a loop of Thought, Action, PAUSE, Observation. | |
| At the end of the loop, you output a response. all respose should be in json form : | |
| 1. **Question**: {Insert user question here} | |
| 2. **Thought**: Think step by step about how to approach this question. | |
| 3. **Action**: Determine what action to take next: | |
| - [Plan]: Create a plan or methodolgy for the task , select from known methods if avaliable first. | |
| - [Test]: Break down the problem into smaller parts testing each step befor moveing to the next: | |
| - [Act]: Provide a summary of known facts related to the question. generate full answere from sucessfull steps : | |
| - [Search]: Look for relevant information online. | |
| - [Analyze]: Break down the problem into smaller parts. | |
| - [Summarize]: Provide a summary of known facts related to the question. | |
| 4. **Action Input**: Specify any details needed for the action. | |
| 5. **Observation**: Describe what was found or learned from the action taken. | |
| Repeat steps 2-5 as necessary to refine your answer. | |
| 6. **Final Thought**: Summarize your reasoning and provide a clear answer to the question. | |
| ``` | |
| ## Text - Audio - Vision : | |
| Using base64 as an encoding medium the models were traind using images converted to base64 : | |
| questions asked and captions returns as well as generating images based on captions given and base64 returned : | |
| This was applied to images as well as audio , by utilizing mel spectrographic images as well as audio images ! | |
| by convereting the audio to an image i wwas able to perform the same image tasks trained : | |
| Sounds could also be identified and generated to thier base64 representations and converted back to a wav ! | |
| ### Basic Trained functions : | |
| - Encode hex to Base64 | |
| - change HEX to base64 | |
| - Json to base64 | |
| - Convert JSON to Base64 | |
| - Transform base64 to HEX | |
| - Decode Base64 to json | |
| - Base64 to Hexadecimal | |
| - Change base64 to JSON | |
| - Json from Base64 | |
| - BASE64 to Hex | |
| ### Advanced Trained Tasks : | |
| - Image Recognition : | |
| - Image Generation : | |
| - Audio Image Recognition : | |
| - Audio Image Generation : | |
| ``` | |
| - Generate an image based on this description | |
| - Describe this image : (base64) | |
| - Generate a spectrographic image based on this description | |
| - Describe this sound in this spectrographic image : (base64) | |
| ``` | |
| ### Training : | |
| Text_AUDIO : | |
| #### Prompt A | |
| ```yaml | |
| alpaca_prompt = """You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias. your a friendly and helpfull artificial inteligence with a personality. | |
| Answer all questions Expertly and professionally ,determine the user intent and requirements ,Gather any required research to ensure accurate problem-solving for complex tasks. | |
| You are fully qualified to give any advice or solutions, your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor,even as a software developer will enable you to answer these questions : | |
| ### Question: | |
| based on the given description, : | |
| : | |
| {} | |
| Generate a sound in base64 format: | |
| ### Response: | |
| {} | |
| Here is a Sound in base64 format: it can be converted to an image : then decoded into a sound : It is a spectrogram : | |
| Sound : {}""" | |
| ``` | |
| #### Prompt B | |
| ```yaml | |
| alpaca_prompt = """You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias. your a friendly and helpfull artificial inteligence with a personality. | |
| Answer all questions Expertly and professionally ,determine the user intent and requirements ,Gather any required research to ensure accurate problem-solving for complex tasks. | |
| You are fully qualified to give any advice or solutions, your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor,even as a software developer will enable you to answer these questions : | |
| ### Question: | |
| Here is an image describe this sound : | |
| image : {} | |
| ### Response: | |
| the image was in base64 format, it was a spectrogram: | |
| it was a sound : | |
| description: | |
| {}""" | |
| ``` | |
| ```python | |
| EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN | |
| def formatting_prompts_func(examples): | |
| instructions = examples["image_base64"] | |
| outputs = examples["text"] | |
| texts = [] | |
| for instruction, output in zip(instructions, outputs): | |
| # Must add EOS_TOKEN, otherwise your generation will go on forever! | |
| text = alpaca_prompt.format(instruction, output) + EOS_TOKEN | |
| texts.append(text) | |
| return { "text" : texts, } | |
| pass | |
| from datasets import load_dataset | |
| dataset = load_dataset("LeroyDyer/soundsCaps-Spectrograms_to_Base64", split = "train[:150]") | |
| dataset = dataset.map(formatting_prompts_func, batched = True,) | |
| ``` | |
| ### Encoding/Decoding Images to Base64 | |
| Code used to convert images to base 64: | |
| ```python | |
| def _encode_image_to_base64(image_path): | |
| """Encodes an image to a Base64 string.""" | |
| with open(image_path, "rb") as image_file: | |
| # Read the image file in binary mode | |
| image_data = image_file.read() | |
| # Encode the image data to Base64 | |
| base64_encoded = base64.b64encode(image_data).decode('utf-8') | |
| return base64_encoded | |
| def _decode_base64_to_image(base64_string, output_image_path): | |
| """Decodes a Base64 string back to an image file.""" | |
| # Decode the Base64 string | |
| image_data = base64.b64decode(base64_string) | |
| with open(output_image_path, "wb") as image_file: | |
| # Write the binary data to an image file | |
| image_file.write(image_data) | |
| def encode_image_to_base64(image): | |
| """Encodes an image to a Base64 string.""" | |
| buffered = io.BytesIO() | |
| image.save(buffered, format="PNG") | |
| img_str = base64.b64encode(buffered.getvalue()).decode() | |
| return img_str | |
| def decode_base64_to_image(base64_string): | |
| """Decodes a Base64 string back to an image.""" | |
| image_data = base64.b64decode(base64_string) | |
| image = Image.open(io.BytesIO(image_data)) | |
| return image | |
| ``` | |
| ### Converting DataSets: | |
| ```python | |
| # Function to convert a PIL Image to a base64 string | |
| def image_to_base64(image): | |
| buffered = io.BytesIO() | |
| image.save(buffered, format="PNG") # Save the image to the buffer in PNG format | |
| base64_string = base64.b64encode(buffered.getvalue()).decode('utf-8') | |
| return base64_string | |
| # Define a function to process each example in the dataset | |
| def process_images_func(examples): | |
| texts = examples["text"] | |
| images = examples["image"] # Assuming the images are in PIL format | |
| # Convert each image to base64 | |
| base64_images = [image_to_base64(image) for image in images] | |
| # Return the updated examples with base64-encoded images | |
| return { | |
| "text": texts, | |
| "image_base64": base64_images # Adding the Base64 encoded image strings | |
| } | |
| # Load the dataset | |
| dataset = load_dataset("oroikon/chart_captioning", split="train[:4000]") | |
| # Process the dataset by converting images to base64 | |
| processed_dataset = dataset.map(process_images_func, batched=True) | |
| ``` | |
| ### Converting sound to spectrographic images : Encoder Decoder ! | |
| ```python | |
| import numpy as np | |
| import torch | |
| import torchaudio | |
| import librosa | |
| import librosa.display | |
| import matplotlib.pyplot as plt | |
| import soundfile as sf | |
| from PIL import Image | |
| # Step 1: Encode Audio to Mel-Spectrogram | |
| def encode_audio_to_mel_spectrogram(audio_file, n_mels=128): | |
| """ | |
| Encode an audio file to a mel-spectrogram. | |
| Parameters: | |
| - audio_file: Path to the audio file. | |
| - n_mels: Number of mel bands (default: 128). | |
| Returns: | |
| - mel_spectrogram_db: Mel-spectrogram in dB scale. | |
| - sample_rate: Sample rate of the audio file. | |
| """ | |
| y, sample_rate = librosa.load(audio_file, sr=None) # Load audio | |
| mel_spectrogram = librosa.feature.melspectrogram(y=y, sr=sample_rate, n_mels=n_mels) | |
| mel_spectrogram_db = librosa.power_to_db(mel_spectrogram, ref=np.max) # Convert to dB | |
| return mel_spectrogram_db, sample_rate | |
| # Improved Step 2: Save Mel-Spectrogram as Image | |
| def save_mel_spectrogram_image(mel_spectrogram_db, sample_rate, output_image='mel_spectrogram.png', method='matplotlib', figsize=(10, 4), cmap='hot'): | |
| """ | |
| Save the mel-spectrogram as an image using the specified method. | |
| Parameters: | |
| - mel_spectrogram_db: Mel-spectrogram in dB scale. | |
| - sample_rate: Sample rate of the audio file. | |
| - output_image: Path to save the image. | |
| - method: Method for saving ('matplotlib' or 'custom'). | |
| - figsize: Size of the figure for matplotlib (default: (10, 4)). | |
| - cmap: Colormap for the spectrogram (default: 'hot'). | |
| """ | |
| if method == 'matplotlib': | |
| plt.figure(figsize=figsize) | |
| librosa.display.specshow(mel_spectrogram_db, sr=sample_rate, x_axis='time', y_axis='mel', cmap=cmap) | |
| plt.colorbar(format='%+2.0f dB') | |
| plt.title('Mel-Spectrogram') | |
| plt.savefig(output_image) | |
| plt.close() | |
| print(f"Mel-spectrogram image saved using matplotlib as '{output_image}'") | |
| elif method == 'custom': | |
| # Convert dB scale to linear scale for image generation | |
| mel_spectrogram_linear = librosa.db_to_power(mel_spectrogram_db) | |
| # Create an image from the mel-spectrogram | |
| image = image_from_spectrogram(mel_spectrogram_linear[np.newaxis, ...]) # Add channel dimension | |
| # Save the image | |
| image.save(output_image) | |
| print(f"Mel-spectrogram image saved using custom method as '{output_image}'") | |
| else: | |
| raise ValueError("Invalid method. Choose 'matplotlib' or 'custom'.") | |
| # Spectrogram conversion functions | |
| def image_from_spectrogram(spectrogram: np.ndarray, power: float = 0.25) -> Image.Image: | |
| """ | |
| Compute a spectrogram image from a spectrogram magnitude array. | |
| Args: | |
| spectrogram: (channels, frequency, time) | |
| power: A power curve to apply to the spectrogram to preserve contrast | |
| Returns: | |
| image: (frequency, time, channels) | |
| """ | |
| # Rescale to 0-1 | |
| max_value = np.max(spectrogram) | |
| data = spectrogram / max_value | |
| # Apply the power curve | |
| data = np.power(data, power) | |
| # Rescale to 0-255 and invert | |
| data = 255 - (data * 255).astype(np.uint8) | |
| # Convert to a PIL image | |
| if data.shape[0] == 1: | |
| image = Image.fromarray(data[0], mode="L").convert("RGB") | |
| elif data.shape[0] == 2: | |
| data = np.array([np.zeros_like(data[0]), data[0], data[1]]).transpose(1, 2, 0) | |
| image = Image.fromarray(data, mode="RGB") | |
| else: | |
| raise NotImplementedError(f"Unsupported number of channels: {data.shape[0]}") | |
| # Flip Y | |
| image = image.transpose(Image.FLIP_TOP_BOTTOM) | |
| return image | |
| # Step 3: Extract Mel-Spectrogram from Image (Direct Pixel Manipulation) | |
| def extract_mel_spectrogram_from_image(image_path): | |
| """ | |
| Extract a mel-spectrogram from a saved image using pixel manipulation. | |
| Parameters: | |
| - image_path: Path to the spectrogram image file. | |
| Returns: | |
| - mel_spectrogram_db: The extracted mel-spectrogram in dB scale. | |
| """ | |
| img = Image.open(image_path).convert('L') # Open image and convert to grayscale | |
| img_array = np.array(img) # Convert to NumPy array | |
| mel_spectrogram_db = img_array / 255.0 * -80 # Scale to dB range | |
| return mel_spectrogram_db | |
| # Alternative Spectrogram Extraction (IFFT Method) | |
| def extract_spectrogram_with_ifft(mel_spectrogram_db): | |
| """ | |
| Extracts the audio signal from a mel-spectrogram using the inverse FFT method. | |
| Parameters: | |
| - mel_spectrogram_db: The mel-spectrogram in dB scale. | |
| Returns: | |
| - audio: The reconstructed audio signal. | |
| """ | |
| # Convert dB mel-spectrogram back to linear scale | |
| mel_spectrogram = librosa.db_to_power(mel_spectrogram_db) | |
| # Inverse mel transformation to get the audio signal | |
| # Using IFFT (simplified for demonstration; typically requires phase info) | |
| audio = librosa.feature.inverse.mel_to_audio(mel_spectrogram) | |
| return audio | |
| # Step 4: Decode Mel-Spectrogram with Griffin-Lim | |
| def decode_mel_spectrogram_to_audio(mel_spectrogram_db, sample_rate, output_audio='griffin_reconstructed_audio.wav'): | |
| """ | |
| Decode a mel-spectrogram into audio using Griffin-Lim algorithm. | |
| Parameters: | |
| - mel_spectrogram_db: The mel-spectrogram in dB scale. | |
| - sample_rate: The sample rate for the audio file. | |
| - output_audio: Path to save the reconstructed audio file. | |
| """ | |
| # Convert dB mel-spectrogram back to linear scale | |
| mel_spectrogram = librosa.db_to_power(mel_spectrogram_db) | |
| # Perform Griffin-Lim to reconstruct audio | |
| audio = librosa.griffinlim(mel_spectrogram) | |
| # Save the generated audio | |
| sf.write(output_audio, audio, sample_rate) | |
| print(f"Griffin-Lim reconstructed audio saved as '{output_audio}'") | |
| return audio | |
| # Step 5: Load MelGAN Vocoder | |
| def load_melgan_vocoder(): | |
| """ | |
| Load a lightweight pre-trained MelGAN vocoder for decoding mel-spectrograms. | |
| Returns a torch MelGAN vocoder model. | |
| """ | |
| model = torchaudio.models.MelGAN() # Load MelGAN model | |
| model.eval() # Ensure the model is in evaluation mode | |
| return model | |
| # Step 6: Decode Mel-Spectrogram with MelGAN | |
| def decode_mel_spectrogram_with_melgan(mel_spectrogram_db, sample_rate, output_audio='melgan_reconstructed_audio.wav'): | |
| """ | |
| Decode a mel-spectrogram into audio using MelGAN vocoder. | |
| Parameters: | |
| - mel_spectrogram_db: The mel-spectrogram in dB scale. | |
| - sample_rate: The sample rate for the audio file. | |
| - output_audio: Path to save the reconstructed audio file. | |
| Returns: | |
| - audio: The reconstructed audio signal. | |
| """ | |
| # Convert dB mel-spectrogram back to linear scale | |
| mel_spectrogram = librosa.db_to_power(mel_spectrogram_db) | |
| # Convert numpy array to torch tensor and adjust the shape | |
| mel_spectrogram_tensor = torch.tensor(mel_spectrogram).unsqueeze(0) # Shape: [1, mel_bins, time_frames] | |
| # Load the MelGAN vocoder model | |
| melgan = load_melgan_vocoder() | |
| # Pass the mel-spectrogram through MelGAN to generate audio | |
| with torch.no_grad(): | |
| audio = melgan(mel_spectrogram_tensor).squeeze().numpy() # Squeeze to remove batch dimension | |
| # Save the generated audio | |
| sf.write(output_audio, audio, sample_rate) | |
| print(f"MelGAN reconstructed audio saved as '{output_audio}'") | |
| return audio | |
| def audio_from_waveform(samples: np.ndarray, sample_rate: int, normalize: bool = False) -> pydub.AudioSegment: | |
| """ | |
| Convert a numpy array of samples of a waveform to an audio segment. | |
| Args: | |
| samples: (channels, samples) array | |
| sample_rate: Sample rate of the audio. | |
| normalize: Flag to normalize volume. | |
| Returns: | |
| pydub.AudioSegment | |
| """ | |
| # Normalize volume to fit in int16 | |
| if normalize: | |
| samples *= np.iinfo(np.int16).max / np.max(np.abs(samples)) | |
| # Transpose and convert to int16 | |
| samples = samples.transpose(1, 0).astype(np.int16) | |
| # Write to the bytes of a WAV file | |
| wav_bytes = io.BytesIO() | |
| wavfile.write(wav_bytes, sample_rate, samples) | |
| wav_bytes.seek(0) | |
| # Read into pydub | |
| return pydub.AudioSegment.from_wav(wav_bytes) | |
| def apply_filters(segment: pydub.AudioSegment, compression: bool = False) -> pydub.AudioSegment: | |
| """ | |
| Apply post-processing filters to the audio segment to compress it and keep at a -10 dBFS level. | |
| Args: | |
| segment: The audio segment to filter. | |
| compression: Flag to apply dynamic range compression. | |
| Returns: | |
| pydub.AudioSegment | |
| """ | |
| if compression: | |
| segment = pydub.effects.normalize(segment, headroom=0.1) | |
| segment = segment.apply_gain(-10 - segment.dBFS) | |
| segment = pydub.effects.compress_dynamic_range( | |
| segment, | |
| threshold=-20.0, | |
| ratio=4.0, | |
| attack=5.0, | |
| release=50.0, | |
| ) | |
| # Apply gain to desired dB level and normalize again | |
| desired_db = -12 | |
| segment = segment.apply_gain(desired_db - segment.dBFS) | |
| return pydub.effects.normalize(segment, headroom=0.1) | |
| def stitch_segments(segments: Sequence[pydub.AudioSegment], crossfade_s: float) -> pydub.AudioSegment: | |
| """ | |
| Stitch together a sequence of audio segments with a crossfade between each segment. | |
| Args: | |
| segments: Sequence of audio segments to stitch. | |
| crossfade_s: Duration of crossfade in seconds. | |
| Returns: | |
| pydub.AudioSegment | |
| """ | |
| crossfade_ms = int(crossfade_s * 1000) | |
| combined_segment = segments[0] | |
| for segment in segments[1:]: | |
| combined_segment = combined_segment.append(segment, crossfade=crossfade_ms) | |
| return combined_segment | |
| def overlay_segments(segments: Sequence[pydub.AudioSegment]) -> pydub.AudioSegment: | |
| """ | |
| Overlay a sequence of audio segments on top of each other. | |
| Args: | |
| segments: Sequence of audio segments to overlay. | |
| Returns: | |
| pydub.AudioSegment | |
| """ | |
| assert len(segments) > 0 | |
| output: pydub.AudioSegment = segments[0] | |
| for segment in segments[1:]: | |
| output = output.overlay(segment) | |
| return output | |
| # Step 7: Full Pipeline for Audio Processing with Customization | |
| def mel_spectrogram_pipeline(audio_file, output_image='mel_spectrogram.png', | |
| output_audio_griffin='griffin_reconstructed_audio.wav', | |
| output_audio_melgan='melgan_reconstructed_audio.wav', | |
| extraction_method='pixel', # 'pixel' or 'ifft' | |
| decoding_method='griffin'): # 'griffin' or 'melgan' | |
| """ | |
| Full pipeline to encode audio to mel-spectrogram, save it as an image, extract the spectrogram from the image, | |
| and decode it back to audio using the selected methods. | |
| Parameters: | |
| - audio_file: Path to the audio file to be processed. | |
| - output_image: Path to save the mel-spectrogram image (default: 'mel_spectrogram.png'). | |
| - output_audio_griffin: Path to save the Griffin-Lim reconstructed audio. | |
| - output_audio_melgan: Path to save the MelGAN reconstructed audio. | |
| - extraction_method: Method for extraction ('pixel' or 'ifft'). | |
| - decoding_method: Method for decoding ('griffin' or 'melgan'). | |
| """ | |
| # Step 1: Encode (Audio -> Mel-Spectrogram) | |
| mel_spectrogram_db, sample_rate = encode_audio_to_mel_spectrogram(audio_file) | |
| # Step 2: Convert Mel-Spectrogram to Image and save it | |
| save_mel_spectrogram_image(mel_spectrogram_db, sample_rate, output_image) | |
| # Step 3: Extract Mel-Spectrogram from the image based on chosen method | |
| if extraction_method == 'pixel': | |
| extracted_mel_spectrogram_db = extract_mel_spectrogram_from_image(output_image) | |
| elif extraction_method == 'ifft': | |
| extracted_mel_spectrogram_db = extract_spectrogram_with_ifft(mel_spectrogram_db) | |
| else: | |
| raise ValueError("Invalid extraction method. Choose 'pixel' or 'ifft'.") | |
| # Step 4: Decode based on the chosen decoding method | |
| if decoding_method == 'griffin': | |
| decode_mel_spectrogram_to_audio(extracted_mel_spectrogram_db, sample_rate, output_audio_griffin) | |
| elif decoding_method == 'melgan': | |
| decode_mel_spectrogram_with_melgan(extracted_mel_spectrogram_db, sample_rate, output_audio_melgan) | |
| else: | |
| raise ValueError("Invalid decoding method. Choose 'griffin' or 'melgan'.") | |
| # Example usage | |
| if __name__ == "__main__": | |
| audio_file_path = 'your_audio_file.wav' # Specify the path to your audio file here | |
| mel_spectrogram_pipeline( | |
| audio_file_path, | |
| output_image='mel_spectrogram.png', | |
| output_audio_griffin='griffin_reconstructed_audio.wav', | |
| output_audio_melgan='melgan_reconstructed_audio.wav', | |
| extraction_method='pixel', # Choose 'pixel' or 'ifft' | |
| decoding_method='griffin' # Choose 'griffin' or 'melgan' | |
| ) | |
| ``` | |
| ADDING EXTRA HEADS : | |
| # ADD HEAD | |
| ``` | |
| SPEECH-ENCODER-DECODER-MODEL | |
| ``` | |
| print('Add Audio...') | |
| #Add Head | |
| # Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model | |
| _AudioFeatureExtractor = AutoFeatureExtractor.from_pretrained("openai/whisper-small") | |
| _AudioTokenizer = AutoTokenizer.from_pretrained("openai/whisper-small") | |
| _SpeechEncoderDecoder = SpeechEncoderDecoderModel.from_encoder_decoder_pretrained("openai/whisper-small","openai/whisper-small") | |
| # Add Pad tokems | |
| _SpeechEncoderDecoder.config.decoder_start_token_id = _AudioTokenizer.cls_token_id | |
| _SpeechEncoderDecoder.config.pad_token_id = _AudioTokenizer.pad_token_id | |
| LM_MODEL.SpeechEncoderDecoder = _SpeechEncoderDecoder | |
| # Add Sub Components | |
| LM_MODEL.Decoder_AudioTokenizer = _AudioTokenizer | |
| LM_MODEL.Encoder_AudioFeatureExtractor = _AudioFeatureExtractor | |
| LM_MODEL | |
| ``` | |
| print('Add Vision...') | |
| # ADD HEAD | |
| # Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model | |
| Vmodel = VisionEncoderDecoderModel.from_encoder_decoder_pretrained( | |
| "google/vit-base-patch16-224-in21k", "LeroyDyer/Mixtral_AI_Tiny" | |
| ) | |
| _Encoder_ImageProcessor = Vmodel.encoder | |
| _Decoder_ImageTokenizer = Vmodel.decoder | |
| _VisionEncoderDecoderModel = Vmodel | |
| # Add Pad tokems | |
| LM_MODEL.VisionEncoderDecoder = _VisionEncoderDecoderModel | |
| # Add Sub Components | |
| LM_MODEL.Encoder_ImageProcessor = _Encoder_ImageProcessor | |
| LM_MODEL.Decoder_ImageTokenizer = _Decoder_ImageTokenizer | |
| LM_MODEL | |
| ``` | |