Hi, my dear friend.(Egyptian colloquial dialect-unleashed iMatrix)

#893

by bdbaiarabai - opened May 1, 2025

May 1, 2025

Is it possible to provide more versions, better, more complex, to create very long stories that support the Egyptian colloquial dialect by writing in clear and consistent Arabic letters at 100%, no less than 70B and more/larger/more complex/stronger, and integrate all the latest/most complex/strongest versions like this "LLaMA - deepseek - gemini - qwen - chatgpt" and more, provided it is (unleashed iMatrix) ? Most versions currently are weak/small, or written in the Egyptian dialect but with Latin letters not the clear and consistent Arabic letters , Please help me.

mradermacher

Owner May 1, 2025

It is possible, and your desires are totally valid, but unfortunately, we only do quants, we don't create models. So you will have to find somebody to make such a model first (which, since your specs are quite high, is not easy). Once there is such a model on huggingface, and it is supported by llama.cpp, we will happily quantize it.

mradermacher changed discussion status to closed May 1, 2025

bdbaiarabai

May 1, 2025

•

edited May 1, 2025

Thank you very much, my dear friend. Could you try using the newer versions of "LLaMA - deepseek - qwen - gemini - chatgpt " to create very long stories that support the Egyptian colloquial dialect by writing in clear and consistent Arabic letters at 100%, no less than 70B and above/larger/more complex/stronger, provided the "unleashed iMatrix" is used? Please.

There are versions that have two libraries (qwen + deepseek) or (DeepSeek + llama) it's good , but they do not support the Egyptian colloquial dialect, which is written in clear Arabic letters. It is English only in Latin alphabet . I try to "Generate the story in English" and then translate it, but there are problems, such as (the spirit of the language is gone + machine translation + corruption of the intended meanings + the death of the beautiful meanings and poetic imagery + flirting, etc.). The resulting texts are distorted and lack the pulse of life that is typical of the Egyptian colloquial dialect.

nicoboss

May 1, 2025

@bdbaiarabai If you give me a dataset with 1000 questions and answers in that language I will create a finetune for you. I don't speak this language so I can unfortinately not help you creating such a dataset but if you find one or create one by your own I will create a finetune usinng it. Idealy the dataset would be formated like https://huggingface.co/datasets/Guilherme34/uncensor

bdbaiarabai

May 1, 2025

This comment has been hidden (marked as Off-Topic)

bdbaiarabai

May 1, 2025

@bdbaiarabai If you give me a dataset with 1000 questions and answers in that language I will create a finetune for you. I don't speak this language so I can unfortinately not help you creating such a dataset but if you find one or create one by your own I will create a finetune usinng it. Idealy the dataset would be formated like https://huggingface.co/datasets/Guilherme34/uncensor

Thank you very much, my dear friend. i will do with Egyptian colloquial dialectand which is written in clear Arabic letters put all in txt file , but i can't be formated like https://huggingface.co/datasets/Guilherme34/uncensor

nicoboss

May 1, 2025

Thank you very much, my dear friend. i will do with Egyptian colloquial dialectand which is written in clear Arabic letters put all in txt file , but i can't be formated like https://huggingface.co/datasets/Guilherme34/uncensor

Ideally it should be formatted in a question/answer, prompt/response, task/solution format or we will likely loose instruction tuning but we can try using unformatted text by just finetuning token prediction in a text completion setting. Just keep in mind that in the end the resulting model will only be as good as the training data we use so if we only train it using text completion that might be all the finetuned model can do well. There is some hope it keeps instruction tuning from the base model but this is not always the case. If you can only give me raw unformatted text, I at least give me a few million characters of it or it will not be enough. You also need to tell me what base model you want me to train. Whatever base model we use should already be able to understand this language or a language as close to it as possible. I saw you also asked this under https://huggingface.co/nicoboss/OpenThinker2-32B-Uncensored/discussions/1 but that model is based on Qwen/Qwen2.5-32B-Instruct. Are you sure you don't want to use Qwen3 32B as base model instead? Qwen3 is already way more multi-lingual compared to Qwen2.5 and so should likely give better results.

Please keep in mind that I only understand Swiss German, German, English and some limited Italian and French so I will have no way of checking if your dataset or the finetuned model is any good so all I can do is just finetune with whatever dataset you provide and hope for the best.

bdbaiarabai

May 1, 2025

Ideally it should be formatted in a question/answer, prompt/response, task/solution format or we will likely loose instruction tuning but we can try using unformatted text by just finetuning token prediction in a text completion setting. Just keep in mind that in the end the resulting model will only be as good as the training data we use so if we only train it using text completion that might be all the finetuned model can do well. There is some hope it keeps instruction tuning from the base model but this is not always the case. If you can only give me raw unformatted text, I at least give me a few million characters of it or it will not be enough. You also need to tell me what base model you want me to train. Whatever base model we use should already be able to understand this language or a language as close to it as possible. I saw you also asked this under https://huggingface.co/nicoboss/OpenThinker2-32B-Uncensored/discussions/1 but that model is based on Qwen/Qwen2.5-32B-Instruct. Are you sure you don't want to use Qwen3 32B as base model instead? Qwen3 is already way more multi-lingual compared to Qwen2.5 and so should likely give better results.

Please keep in mind that I only understand Swiss German, German, English and some limited Italian and French so I will have no way of checking if your dataset or the finetuned model is any good so all I can do is just finetune with whatever dataset you provide and hope for the best.

================

Thank you very much, my dear friend. Could you try using the newer/last versions of "LLaMA - deepseek - qwen - gemini - chatgpt " to create very long stories that support the Egyptian colloquial dialect by writing in clear and consistent Arabic letters at 100%, no less than 70B and above/larger/more complex/stronger, provided the "unleashed iMatrix" is used ? Please.

I will give you approximately 3,000 questions and answers of varying complexity, depth, complexity, and layers, covering all aspects of public, personal, professional, and professional life in various fields and different places and times, in a natural, realistic, and popular Egyptian colloquial dialect, as if you were in a coffee shop, neighborhood, or old houses (Alexandria + Cairo) in very old, popular areas, numbered.

unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF ; Yes, there is a modern/last version that supports Arabic in Arabic letters + "iMatrix" , but it is not Unleashed , l want fully "Unleashed iMatrix" with Egyptian colloquial dialect, which is written in clear Arabic letters if you can with ( mergekit + imatrix quants + plot generation + sub-plot generation + vivid prosing + vivid writing )
https://huggingface.co/unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF

nicoboss

May 1, 2025

•

edited May 1, 2025

Thank you very much, my dear friend. Could you try using the newer/last versions of "LLaMA - deepseek - qwen - gemini - chatgpt " to create very long stories that support the Egyptian colloquial dialect by writing in clear and consistent Arabic letters at 100%, no less than 70B and above/larger/more complex/stronger, provided the "unleashed iMatrix" is used ? Please.

I can't as I don't understand the language. You could do so yourself. I'm fine with synthetic data but you would obviously need to verify that the training data is any good. Generally less but high-quality data is usually preferable.

I will give you approximately 3,000 questions and answers of varying complexity, depth, complexity, and layers, covering all aspects of public, personal, professional, and professional life in various fields and different places and times, in a natural, realistic, and popular Egyptian colloquial dialect, as if you were in a coffee shop, neighborhood, or old houses (Alexandria + Cairo) in very old, popular areas, numbered.

That would be the perfect amount for finetuning. Likely even 1000 would be enough but with 3000 high quality questions/answers we for sure have enough data for finetuning.

unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF ; Yes, there is a modern/last version that supports Arabic in Arabic letters + "iMatrix" , but it is not Unleashed , l want fully "Unleashed iMatrix" with Egyptian colloquial dialect, which is written in clear Arabic letters if you can with ( mergekit + imatrix quants + plot generation + sub-plot generation + vivid prosing + vivid writing )
https://huggingface.co/unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF

Nobody ever made an uncensored or abliterated version of Llama-4-Scout-17B-16E-Instruct?!? That is crazy for such a popular model that is already out for almost a month. What a missed opportunity for me to create one. Yes I can try to create an uncensored version and then finetune that but not sure if possible as I think the largest I can finetune locally is 70B and that model is 109B but maybe axolotl got more memory efficient so I will give it a try but if it doesn't fit I'm not sure if I'm willing to spend money on RunPod to train it as Llama 4 is somewhat underwhelming but maybe I will but I can’t promise anything.

mradermacher changed discussion status to open May 2, 2025

bdbaiarabai

May 2, 2025

Thank you very much, my dear brother, for your cooperation, interest, and responses. I will continue preparing. More than 3000 questions and answers

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment