Dataset is just a zip of files
Research community developing various code models, small and big. Models may not be instruct
They have the 1.3B version!!! This may be the best to start with Newspeak. Should work train even on huggingcface
Another possible model. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.
With the optimizers of bitsandbytes (like 8 bit AdamW), you would need 2 bytes per parameter, or 14 GB of GPU memory.
Another potential model to use for Newspeak, but it is NOT open source. Adventage: 2.5B params, so should be usable in small GPUs
The Mixtral model is new, and seems to be good. Click on “Demo“ to test it
Article has comparison with other code-LLM models
Chat, models. Not open source, but instruct and relatively small (3B). The 3B instruct may be the best to try on Newspeak.
Look for models that could be used in Newspeak
Serve files from another directory
Newspeak the Orwell totalitarian language, not the Bracha computer language :)