Friday, Sept 20 2024

What's in the RedPajama-Data-1T LLM training set

By A Mystery Man Writer

RedPajama is “a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens”. It’s a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, …

togethercomputer/RedPajama-Data-V2 · Datasets at Hugging Face

LLM360, A true Open Source LLM

Inside language models (from GPT to Olympus) – Dr Alan D. Thompson

65-Billion-Parameter Large Model Pretraining Accelerated by 38

GitHub - togethercomputer/RedPajama-Data: The RedPajama-Data

Inside language models (from GPT to Olympus) – Dr Alan D. Thompson

Web LLM runs the vicuna-7b Large Language Model entirely in your

Bringing LLM Fine-Tuning and RLHF to Everyone