Post by @rajeevk02642739 • Hey

The entire RedPajama dataset itself contains 1.2 trillion tokens. Above is an Atlas data map 🗺️containing a random 1B token subsample colored by the is-Wi

Stats

Comments