|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# Seq2Seq Transformer for Function Call Generation |
|
|
|
|
|
This repository hosts a custom-trained Seq2Seq Transformer model designed to convert natural language queries into corresponding function call representations. The model leverages an encoder-decoder Transformer architecture built from scratch using PyTorch and supports versioning to facilitate continuous improvements and updates. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Architecture:** |
|
|
A full Transformer-based encoder-decoder model with multi-head attention and feed-forward layers. The model incorporates sinusoidal positional encoding to capture sequential information. |
|
|
|
|
|
- **Tokenization & Vocabulary:** |
|
|
The model uses a custom-built vocabulary derived from training data. Special tokens include: |
|
|
- `<pad>` for padding, |
|
|
- `<bos>` to denote the beginning of a sequence, |
|
|
- `<eos>` to denote the end of a sequence, and |
|
|
- `<unk>` for unknown tokens. |
|
|
|
|
|
- **Training:** |
|
|
Trained on paired examples of natural language inputs and function call outputs using a cross-entropy loss function. The training process supports versioning, where each training run increments the model version, and each version is stored for reproducibility and comparison. |
|
|
|
|
|
- **Inference:** |
|
|
Greedy decoding is used to generate output sequences from an input sequence. Users can specify the model version to load the appropriate model for inference. |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is primarily intended for: |
|
|
- Automated function call generation from natural language instructions. |
|
|
- Enhancing natural language interfaces for code generation or task automation. |
|
|
- Integrating into virtual assistants and chatbots to execute backend function calls. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **Data Dependency:** |
|
|
The model's performance relies on the quality and representativeness of the training data. Out-of-distribution inputs may yield suboptimal or erroneous outputs. |
|
|
|
|
|
- **Decoding Strategy:** |
|
|
The current greedy decoding approach may not always produce the most diverse or optimal outputs. Alternative strategies (e.g., beam search) might be explored for improved results. |
|
|
|
|
|
- **Generalization:** |
|
|
While the model works well on data similar to its training examples, its performance may degrade on substantially different domains or complex instructions. |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model is trained on custom datasets comprising natural language inputs paired with function call outputs. Users are encouraged to fine-tune the model on domain-specific data to maximize its utility in real-world applications. |
|
|
|
|
|
## How to Use |
|
|
|
|
|
1. **Loading a Specific Version:** |
|
|
The system supports multiple versions. Specify the model version when performing inference to load the desired model. |
|
|
|
|
|
2. **Inference:** |
|
|
Provide an input text (e.g., "Book me a flight from London to NYC") and the model will generate the corresponding function call output. |
|
|
|
|
|
3. **Publishing:** |
|
|
The model can be published to the Hugging Face Hub with version-specific details for reproducibility and community sharing. |
|
|
|