Upload config

Browse files

Files changed (3) hide show

README.md +199 -0
config.json +39 -0
configuration_bestrq_conformer.py +131 -0

README.md ADDED Viewed

	@@ -0,0 +1,199 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "activation_dropout": 0.0,
+  "architectures": [
+    "MeralionBestRqModel"
+  ],
+  "attention_dropout": 0.0,
+  "auto_map": {
+    "AutoConfig": "configuration_bestrq_conformer.MeralionBestRqConformerConfig",
+    "AutoModel": "modeling_bestrq_conformer.MeralionBestRqModel"
+  },
+  "conformer_conv_dropout": 0.0,
+  "conv_depthwise_kernel_size": 5,
+  "ctc_loss_reduction": "sum",
+  "ctc_zero_infinity": false,
+  "feat_proj_dropout": 0.0,
+  "ffn_dim": 4096,
+  "final_dropout": 0.0,
+  "hidden_act": "swish",
+  "hidden_dropout": 0.0,
+  "hidden_size": 1024,
+  "input_channels": 1,
+  "input_dim": 80,
+  "layerdrop": 0.0,
+  "lstm_dim": 768,
+  "lstm_dropout_prob": 0.0,
+  "lstm_num_layers": 2,
+  "max_source_positions": 3000,
+  "model_type": "meralion_bestrq",
+  "no_scale_embedding": false,
+  "num_attention_heads": 8,
+  "num_hidden_layers": 24,
+  "position_embeddings_type": "relative",
+  "rotary_embedding_base": 10000,
+  "self_condition_layers": [],
+  "torch_dtype": "float32",
+  "transformers_version": "4.51.3",
+  "use_weighted_sum": true,
+  "vocab_size": null
+}

configuration_bestrq_conformer.py ADDED Viewed

	@@ -0,0 +1,131 @@

+from transformers.configuration_utils import PretrainedConfig
+from typing import List
+class MeralionBestRqConformerConfig(PretrainedConfig):
+    """
+    This is the configuration class to store the configuration of a [`MeralionBestRqConformer`]. It is used to
+    instantiate a BEST-RQ Conformer model according to the specified arguments, defining the model architecture.
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
+    documentation from [`PretrainedConfig`] for more information.
+    Args:
+        input_dim (`int`, *optional*, defaults to 80):
+            The number of input features in the mel-frequency spectrogram.
+        input_channels (`int`, *optional*, defaults to 1):
+            The number of input channels of the convolutional subsampling layers.
+        num_attention_heads (`int`, *optional*, defaults to 8):
+            Number of attention heads for each attention layer in the Transformer encoder.
+        hidden_size (`int`, *optional*, defaults to 1024):
+            Dimensionality of the encoder layers and the pooler layer.
+        ffn_dim (`int`, *optional*, defaults to 4096):
+            Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
+        num_hidden_layers (`int`, *optional*, defaults to 24):
+            Number of hidden layers in the Transformer encoder.
+        conv_depthwise_kernel_size (`int`, *optional*, defaults to 5):
+            Kernel size of the depthwise convolution in the Conformer convolution module.
+        feat_proj_dropout (`float`, *optional*, defaults to 0.0):
+            The dropout probability for the input projection layer.
+        activation_dropout (`float`, *optional*, defaults to 0.0):
+            The dropout probability for the activation functions in the feed-forward layers.
+        hidden_dropout (`float`, *optional*, defaults to 0.0):
+            The dropout probability for the hidden layers.
+        max_source_positions (`int`, *optional*, defaults to 3000):
+            The maximum sequence length that this model might ever be used with.
+        no_scale_embedding (`bool`, *optional*, defaults to `False`):
+            Whether to scale the embeddings by the square root of the hidden size.
+        hidden_act (`str`, *optional*, defaults to `"swish"`):
+            The non-linear activation function (function or string) in the encoder and pooler.
+        conformer_conv_dropout (`float`, *optional*, defaults to 0.0):
+            The dropout probability for the Conformer convolution module.
+        position_embeddings_type (`str`, *optional*, defaults to `"relative"`):
+            The type of position embeddings to use. Can be `"relative"` or `"rotary"`.
+        attention_dropout (`float`, *optional*, defaults to 0.0):
+            The dropout probability for the attention layers.
+        rotary_embedding_base (`int`, *optional*, defaults to 10000):
+            The base for the rotary position embeddings.
+        layerdrop (`float`, *optional*, defaults to 0.0):
+            The LayerDrop probability.
+        self_condition_layers (`List`, *optional*, defaults to `[]`):
+            A list of layer indices where self-conditioning should be applied.
+        use_weighted_sum (`bool`, *optional*, defaults to `True`):
+            Whether to use a weighted sum of all hidden states for the final output of the LSTM-CTC model.
+        lstm_dim (`int`, *optional*, defaults to 768):
+            The hidden size of the LSTM layers in the LSTM-CTC head.
+        lstm_num_layers (`int`, *optional*, defaults to 2):
+            The number of layers in the LSTM of the LSTM-CTC head.
+        lstm_dropout_prob (`float`, *optional*, defaults to 0.0):
+            The dropout probability for the LSTM layers in the LSTM-CTC head.
+        final_dropout (`float`, *optional*, defaults to 0.0):
+            The dropout probability for the final layer before the CTC loss.
+        vocab_size (`int`, *optional*):
+            Vocabulary size of the model. Defines the number of different tokens that can be represented by the
+            `inputs_ids` passed when calling [`MeralionBestRqModelForCTC`].
+        ctc_loss_reduction (`str`, *optional*, defaults to `"sum"`):
+            The reduction to apply to the output of `torch.nn.functional.ctc_loss`.
+        ctc_zero_infinity (`bool`, *optional*, defaults to `False`):
+            Whether to zero infinite losses and gradients in `torch.nn.functional.ctc_loss`.
+    """
+    model_type = "meralion_bestrq"
+    def __init__(
+        self,
+        input_dim: int = 80,
+        input_channels: int = 1,
+        num_attention_heads: int = 8,
+        hidden_size: int = 1024, #embed_dim
+        ffn_dim: int = 4096,
+        num_hidden_layers: int = 24,
+        conv_depthwise_kernel_size: int = 5,
+        feat_proj_dropout: float = 0., #for input_projection
+        activation_dropout: float = 0.,
+        hidden_dropout: float = 0.,
+        max_source_positions: int = 3000,
+        no_scale_embedding: bool = False,
+        hidden_act: str = "swish",
+        conformer_conv_dropout: float = 0.,
+        position_embeddings_type: str = "relative",
+        attention_dropout: float = 0.,
+        rotary_embedding_base: int = 10000,
+        layerdrop = 0.,
+        self_condition_layers: List = [], # asr
+        use_weighted_sum: bool = True, #lstm
+        lstm_dim: int = 768, #lstm
+        lstm_num_layers: int = 2, #lstm
+        lstm_dropout_prob = 0., #lstm
+        final_dropout = 0., #ctc
+        vocab_size = None, #ctc
+        ctc_loss_reduction = 'sum', #ctc
+        ctc_zero_infinity = False, #ctc
+        **kwargs,
+    ):
+        self.input_dim = input_dim
+        self.input_channels = input_channels
+        self.num_attention_heads = num_attention_heads
+        self.hidden_size = hidden_size
+        self.ffn_dim = ffn_dim
+        self.num_hidden_layers = num_hidden_layers
+        self.conv_depthwise_kernel_size = conv_depthwise_kernel_size
+        self.feat_proj_dropout = feat_proj_dropout
+        self.activation_dropout = activation_dropout
+        self.hidden_dropout = hidden_dropout
+        self.max_source_positions = max_source_positions
+        self.no_scale_embedding = no_scale_embedding
+        self.hidden_act = hidden_act
+        self.conformer_conv_dropout = conformer_conv_dropout
+        self.position_embeddings_type = position_embeddings_type
+        self.attention_dropout = attention_dropout
+        self.rotary_embedding_base = rotary_embedding_base
+        self.layerdrop = layerdrop
+        self.self_condition_layers = self_condition_layers
+        self.use_weighted_sum = use_weighted_sum
+        self.lstm_dim = lstm_dim
+        self.lstm_num_layers = lstm_num_layers
+        self.lstm_dropout_prob = lstm_dropout_prob
+        self.final_dropout = final_dropout
+        self.vocab_size = vocab_size
+        self.ctc_loss_reduction = ctc_loss_reduction
+        self.ctc_zero_infinity = ctc_zero_infinity
+        super().__init__(**kwargs)