| | --- |
| | license: other |
| | license_name: deepnight-responsible-ai |
| | license_link: LICENSE |
| | language: |
| | - en |
| | library_name: transformers |
| | pipeline_tag: text-generation |
| | tags: |
| | - 600B |
| | - Python |
| | - Code |
| | - Logical Understanding |
| | - Relation Establishment |
| | - Translation |
| | - ai1 |
| | - DEEPNIGHT |
| | --- |
| | <div style="display: flex; justify-content: center; align-items: center;"> |
| | <img src="./cover.jpg" style="width: 100%; max-width: 350px; height: auto;"/></div> |
| |
|
| | # DEEPNIGHT ai1 |
| | The 600 Billion+ Parameter Model. |
| | Yes! We did this! |
| |
|
| | The second largest model in the world, right after GPT-4. |
| |
|
| | --- |
| |
|
| | We at [DEEPNIGHT](https://deepnight.tech) have been working on this for quite some time. |
| | We have successfully built the second largest model called ai1 which comes with 600 Billion+ parameters. |
| |
|
| | `ai1` can perform as good as GPT-4 and has a context-window of 8k tokens. |
| | ai1 was trained with a new approach where after training the model on a corpus of text from various sources including |
| | but not limited to: |
| | - RefinedWeb |
| | - Opensource code from GitHub |
| | - Common Crawl |
| | we fine-tuned the model on a huge dataset (generated manually and with automation) for logical understanding and reasoning. |
| | We also trained the model for function calling capabilities. |
| |
|
| | --- |
| |
|
| | ## What is special about ai1? |
| | ai1 works on a chaining methodology which is built-in. When it receives an input from the user, it tries to understand the input |
| | before acting on generation. It generates an instruction-based prompt internally and then works on generation of the response. |
| | Benefit of this? <b>We'll just say the jobs of Prompt Engineering are over.</b> |
| |
|
| | Unlike ChatGPT, GPT-4, Llama, and other models, ai1 doesn't require heavy prompt engineering to provide answers. |
| | The understanding-development phase in the model takes care of that. |
| |
|
| | What else? |
| | - performs as good as GPT-4 |
| | - excels in automation tasks |
| | - can predict emotions of the user by the conversation (while understanding the input in Phase-1) |
| | resulting in better and curated generations. |
| | - has an understanding towards human-emotions which helps the model curate the content accordingly |
| | - excels in roleplays |
| | - excels in writing code |
| | - the model has a few global memory units which are used to store data away from the context-window. |
| | These memory units are mostly used to store the function schemas but in the end the model decides itself what to store in them. |
| | - if we consider how much would it cost, well, on an average $0.005 per 1000 tokens. |
| |
|
| | --- |
| |
|
| | ## Future goals |
| | We don't discuss that. Specially after seeing how SOME AI COMPANY ON THEIR DEV DAY just used the opensource research and publications |
| | to profit themselves... Hah. |
| |
|
| | --- |
| |
|
| | ## Are we going to allow access? |
| | Not for some time. We are still running evaluations and have a lot to learn about how this model can be made better. |
| |
|
| | --- |
| |
|
| | Feel free to reach out to us at research@deepnight.tech |
| |
|
| | - Team [DEEPNIGHT](https://deepnight.tech) |