Optimum

[https://huggingface.co/docs/optimum/index] - 2024-03-11 19:44:39 - public:mzimmerm

ai, doc, huggingface, llm, model, optimum, repo, small, transformer - 9 | id:1489894 -

Optimum is an extension of Transformers that provides a set of performance optimization tools to train and run models on targeted hardware with maximum efficiency. It is also the repository of small, mini, tiny models.

google-research/bert: TensorFlow code and pre-trained models for BERT

[https://github.com/google-research/bert/] - 2024-03-11 04:44:09 - public:mzimmerm

ai, bert, github, home, llm, mini, model, tiny, transformer - 9 | id:1489883 -

BERT model home on github

google/bert_uncased_L-4_H-256_A-4 · Hugging Face

[https://huggingface.co/google/bert_uncased_L-4_H-256_A-4] - 2024-03-11 04:19:21 - public:mzimmerm

ai, bert, huggingface, llm, model, parameter, small, todo - 8 | id:1489880 -

Repository of all Bert models, including small. Start using this model for testing.

A Step-by-Step Guide to Model Evaluation in Python | by Shreya Singh | Medium

[https://medium.com/@jscvcds/a-step-by-step-guide-to-model-evaluation-in-python-3a72dee92560] - 2024-03-09 07:22:53 - public:mzimmerm

ai, doc, evaluate, model, todo - 5 | id:1489866 -

Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4

[https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard] - 2024-03-05 23:50:45 - public:mzimmerm

ai, compare, huggingface, llm, model - 5 | id:1489821 -

Comparison of efficiency of all LLM models on hugging face

Training Bert on Yelp - Copy of training.ipynb - Colaboratory

[https://colab.research.google.com/drive/1FhwrZ05umMvj4cshnEMUOLxjD9ynvCy9#scrollTo=nCFiAJ55LcLt] - 2024-03-05 07:57:07 - public:mzimmerm

ai, bert, huggingface, model, notebook, progress, yelp - 7 | id:1489813 -

(1) Most cost effective GPU for local LLMs? : LocalLLaMA

[https://www.reddit.com/r/LocalLLaMA/comments/12vxxze/most_cost_effective_gpu_for_local_llms/] - 2024-03-05 00:49:23 - public:mzimmerm

ai, doc, llm, model, optimize, perform - 6 | id:1489804 -

GGML quantized models. They would let you leverage CPU and system RAM, instead of having to rely on a GPU’s. This could save you a fortune, especially if go for some used AMD Epyc platforms. This could be more viable for the larger models, especially the 30B/65B parameters models which would still press or exceed the VRAM on the P40.

Optimizing LLMs for Speed and Memory

[https://huggingface.co/docs/transformers/v4.35.2/en/llm_tutorial_optimization] - 2024-03-05 00:46:21 - public:mzimmerm

ai, doc, huggingface, llm, model, optimize, perform - 7 | id:1489803 -

7 steps to master large language models (LLMs) | Data Science Dojo

[https://datasciencedojo.com/blog/master-large-language-models/#] - 2024-03-04 19:25:57 - public:mzimmerm

ai, doc, highlevel, llm, model, train - 6 | id:1489796 -

LLM for a new language : MachineLearning

[https://www.reddit.com/r/MachineLearning/comments/12xu5ls/p_llm_for_a_new_language/] - 2024-03-04 19:15:48 - public:mzimmerm

ai, highlevel, llm, model, train - 5 | id:1489794 -

High level how to train a model

Up to date List of LLM Models

[https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYpb63e1ZR3aePczz3zlbJW-Y4/edit#gid=741531996] - 2024-03-04 19:13:58 - public:mzimmerm

ai, doc, list, llm, model - 5 | id:1489793 -

(2) Are there any tiny (1-3b) models finetuned for coding available in GGUF format? : LocalLLaMA

[https://www.reddit.com/r/LocalLLaMA/comments/16csdq6/are_there_any_tiny_13b_models_finetuned_for/] - 2024-03-04 10:56:19 - public:mzimmerm

ai, code, generate, llm, model, newspeak, small - 7 | id:1489789 -

bigcode (BigCode)

[https://huggingface.co/bigcode] - 2024-03-04 10:50:02 - public:mzimmerm

ai, code, generate, huggingface, llm, model, newspeak, santacoder, small, starcoder - 10 | id:1489788 -

Research community developing various code models, small and big. Models may not be instruct

WizardLM (WizardLM)

[https://huggingface.co/WizardLM] - 2024-03-04 10:42:44 - public:mzimmerm

ai, code, generate, huggingface, llm, model, newspeak, small, wizardcoder - 9 | id:1489787 -

Another open source small (1B) model.

deepseek-ai (DeepSeek)

[https://huggingface.co/deepseek-ai] - 2024-03-04 10:24:32 - public:mzimmerm

ai, best, code, deepseek, good, huggingface, instruct, llm, model, newspeak, small - 11 | id:1489786 -

They have the 1.3B version!!! This may be the best to start with Newspeak. Should work train even on huggingcface

deepseek-ai/deepseek-coder-6.7b-instruct · Hugging Face

[https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct] - 2024-03-04 10:13:20 - public:mzimmerm

ai, code, generate, good, llm, model, newspeak, opensource - 8 | id:1489783 -

Another possible model. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.

LLaMA 7B GPU Memory Requirement - Transformers - Hugging Face Forums

[https://discuss.huggingface.co/t/llama-7b-gpu-memory-requirement/34323/6] - 2024-03-04 10:10:38 - public:mzimmerm

ai, code, generate, llama, llm, model, newspeak, train - 8 | id:1489782 -

With the optimizers of bitsandbytes (like 8 bit AdamW), you would need 2 bytes per parameter, or 14 GB of GPU memory.

stabilityai/stable-code-3b · Hugging Face

[https://huggingface.co/stabilityai/stable-code-3b] - 2024-03-04 10:05:36 - public:mzimmerm

ai, code, generate, llm, model, newspeak - 6 | id:1489781 -

Another potential model to use for Newspeak, but it is NOT open source. Adventage: 2.5B params, so should be usable in small GPUs

Can Ai Code Results - a Hugging Face Space by mike-ravkine

[https://huggingface.co/spaces/mike-ravkine/can-ai-code-results] - 2024-03-04 09:38:45 - public:mzimmerm

ai, code, generate, huggingface, llm, model, summary - 7 | id:1489779 -

Comparison of LLM models for coding

openchat/openchat-3.5-0106 · Hugging Face

[https://huggingface.co/openchat/openchat-3.5-0106] - 2024-03-04 08:41:50 - public:mzimmerm

ai, code, generate, huggingface, llm, model, openchat - 7 | id:1489775 -

Open source with lots of information. Uses Multiple undrelying models. Not sure how I would train for it

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

[https://huggingface.co/blog/mixtral] - 2024-03-04 08:24:33 - public:mzimmerm

ai, code, generate, huggingface, llm, mixtral, model, newspeak - 8 | id:1489774 -

The Mixtral model is new, and seems to be good. Click on “Demo“ to test it

StarCoder: A State-of-the-Art LLM for Code

[https://huggingface.co/blog/starcoder] - 2024-03-04 07:43:17 - public:mzimmerm

ai, code, generate, good, huggingface, llm, model, newspeak - 8 | id:1489773 -

Article has comparison with other code-LLM models

huybery/Awesome-Code-LLM: An awesome and curated list of best code-LLM for research.

[https://github.com/huybery/Awesome-Code-LLM] - 2024-03-04 07:33:15 - public:mzimmerm

ai, code, generate, list, llm, model - 6 | id:1489772 -

Hannibal046/Awesome-LLM: Awesome-LLM: a curated list of Large Language Model

[https://github.com/Hannibal046/Awesome-LLM] - 2024-03-04 07:31:48 - public:mzimmerm

ai, list, llm, model - 4 | id:1489771 -

Includes code generation models

Large language models and the rise of the AI code generators | InfoWorld

[https://www.infoworld.com/article/3696970/llms-and-the-rise-of-the-ai-code-generators.html] - 2024-03-04 07:14:23 - public:mzimmerm

ai, code, generate, language, model, program, review - 7 | id:1489770 -

Review of LLM specialized for code generation

Large language model - Wikipedia

[https://en.wikipedia.org/wiki/Large_language_model#List] - 2024-03-04 07:08:48 - public:mzimmerm

ai, license, list, llm, model - 5 | id:1489769 -

List of LLM models on Wikipedia

stabilityai (Stability AI) - Stable Diffusion running on Huggingface

[https://huggingface.co/stabilityai] - 2024-03-04 06:24:17 - public:mzimmerm

ai, chat, good, home, huggingface, image, instruct, model, newspeak, small, stabilityai, stablecode - 12 | id:1489767 -

Chat, models. Not open source, but instruct and relatively small (3B). The 3B instruct may be the best to try on Newspeak.

OpenAI Codex - Wikipedia

[https://en.wikipedia.org/wiki/OpenAI_Codex] - 2024-03-04 04:38:12 - public:mzimmerm

ai, code, codex, generate, language, model, program - 7 | id:1489759 -

Model which generates code for Python, Javascript, Go, Shell, Perl, Swifg, Ruby, PHP

codellama (Code Llama) - Huggingface model for generating programs. Maybe can be used for Newspeak?

[https://huggingface.co/codellama] - 2024-03-03 08:48:06 - public:mzimmerm

ai, code, generate, huggingface, language, llama, model, newspeak, program - 9 | id:1489750 -

AI Code Tools: The Ultimate Guide in 2024

[https://codesubmit.io/blog/ai-code-tools/] - 2024-03-03 08:19:57 - public:mzimmerm

ai, code, generate, good, model, tool - 6 | id:1489745 -

AI Code tools : Good summary. Does not talk about which pre-trained model they use. One is gemini (bard) -> alphacode2

BERT 101 - State Of The Art NLP Model Explained

[https://huggingface.co/blog/bert-101] - 2024-03-03 06:50:18 - public:mzimmerm

ai, bert, best, good, model, progress, summary, transform - 8 | id:1489741 -

Best summary of Natural Language Processing and terms - model (a language model - e.g. BertModel, defines encoder and decoder and their properties), transformer (a specific neural network based on attention paper), encoder (series of transformers on input), decoders (series of transformers on output). Bert does NOT use decoder. TensorFlow and PyTorch are possible backends to Transformers (NN). Summary: BERT is a highly complex and advanced language model that helps people automate language understanding.

BERT vs GPT: A Tale of Two Transformers That Revolutionized NLP | by Tavva Prudhvith | Medium

[https://medium.com/@prudhvithtavva/bert-vs-gpt-a-tale-of-two-transformers-that-revolutionized-nlp-11fff8e61984] - 2024-03-03 06:41:37 - public:mzimmerm

ai, bert, good, gpt, model, transform - 6 | id:1489740 -

BigCode - Open and responsible development of LLMs for code

[https://www.bigcode-project.org/] - 2024-03-02 10:21:57 - public:mzimmerm

account, ai, computer, language, model, train - 6 | id:1489729 -

BigCode is an open scientific collaboration working on the responsible development and use of large language models for code

Replit — How to train your own Large Language Models

[https://blog.replit.com/llm-training] - 2024-03-02 10:18:28 - public:mzimmerm

ai, doc, language, llm, model, train - 6 | id:1489728 -

Hi level only talk about training for a language

How to train a new language model from scratch using Transformers and Tokenizers

[https://huggingface.co/blog/how-to-train] - 2024-03-02 09:48:13 - public:mzimmerm

ai, best, doc, good, language, llm, model, todo, train - 9 | id:1489725 -

Describes how to train a new language (desperanto) model.

BigCode - Playground - a Hugging Face Space by bigcode

[https://huggingface.co/spaces/bigcode/bigcode-playground] - 2023-12-10 00:38:55 - public:mzimmerm

ai, bigcode, code, generate, good, model, newspeak, playground, software, starcoder - 10 | id:1485780 -

Look for models that could be used in Newspeak

Flutter layouts guide: Margins and padding - LogRocket Blog

[https://blog.logrocket.com/flutter-layouts-guide-margins-padding/] - 2022-10-24 23:23:13 - public:mzimmerm

box, flutter, layout, model - 4 | id:1287305 -

Viewing mzimmerm's Bookmarks