| | --- |
| | language: en |
| | tags: |
| | - Explain code |
| | - Code Summarization |
| | - Summarization |
| |
|
| | license: mit |
| | --- |
| | |
| |
|
| | # Gemini |
| |
|
| | For in-depth understanding of our model and methods, please see our blog [here](https://www.describe-ai.com/gemini) |
| |
|
| | ## Model description |
| |
|
| | Gemini is a transformer based on Google's T5 model. The model is pre-trained on approximately 800k code/description pairs and then fine-tuned on 10k higher-level explanations that were synthetically generated. Gemini is capable of summarization/explaining short to medium code snippets in: |
| |
|
| | - Python |
| | - Javascript (mostly vanilla JS, however, it can handle frameworks like React as well) |
| | - Java |
| | - Ruby |
| | - Go |
| |
|
| | And outputs a description in English. |
| |
|
| | ## Intended uses & limitations |
| |
|
| | Gemini without any additional fine-tuning is capable of explaining code in a sentence or two and typically performs best in Python and Javascript. We recommend using Gemini for either simple code explanation, documentation or producing more synthetic data to improve its explanations. |
| |
|
| | ### How to use |
| |
|
| | You can use this model directly with a pipeline for Text2Text generation, as shown below: |
| |
|
| | ```python |
| | from transformers import pipeline, set_seed |
| | |
| | summarizer = pipeline('text2text-generation', model='describeai/gemini-small') |
| | code = "print('hello world!')" |
| | |
| | response = summarizer(code, max_length=100, num_beams=3) |
| | print("Summarized code: " + response[0]['generated_text']) |
| | |
| | ``` |
| |
|
| | Which should yield something along the lines of: |
| |
|
| | ``` |
| | Summarized code: The following code is greeting the world. |
| | ``` |
| |
|
| | ### Model sizes |
| |
|
| | - Gemini: 770 Million Parameters |
| | - Gemini-Small (this repo): 220 Million Parameters |
| |
|
| |
|
| | ### Limitations |
| |
|
| | Typically, Gemini may produce overly simplistic descriptions that don't encompass the entire code snippet. We suspect with more training data, this could be circumvented and will produce better results. |
| |
|
| |
|
| | ### About Us |
| |
|
| | A Describe.ai, we are focused on building Artificial Intelligence systems that can understand language as well as humans. While a long path, we plan to contribute our findings to our API to the Open Source community. |