The Ultra Alignment project contains a series of datasets, models and methods to align large language models.

Datasets
Created

UltraChat

We introduce UltraChat, a large-scale dataset designed for training AI assistants, featuring 1.5 million high-quality, diverse multi-turn dialogues without human queries. Leveraging UltraChat, we fine-tuned a LLaMA model to create UltraLM, a powerful conversational model.

Apr 3rd, 2023

UltraFeedback

We introduce UltraFeedback, a large-scale, diversified dataset enhancing Reinforcement Learning from Human Feedback (RLHF) in language models. Addressing the scarcity of diverse preference data, it combines various instructions and models for comparative data. We trained models like UltraRM, UltraLM-13B-PPO, and UltraCM using it.

Sep 7th, 2023

UltraEval

UltraEval is an open-source framework for evaluating the capabilities of foundation models, providing a suite of lightweight, easy-to-use evaluation systems that support the performance assessment of mainstream LLMs.

Nov 24th, 2023
Models
Created

UltraLM-13B

First version of UltraLM, finetuned with UltraChat dataset on Llama-13B. Ranked 1 on AlpacaEval on Jun 28.

Jun 3rd, 2023

UltraLM-13B-v2.0

Second version of UltraLM, finetuned with UltraChat dataset on Llama2-13B and aligned with RL by UltraFeedback. Ranked 1 on AlpacaEval on Nov 28.

Sep 18th, 2023

UltraLM-65B

Finetuned with UltraChat dataset on Llama-65B.

Aug 3rd, 2014
Community Works
Created

Falcon-180b-Chat

Falcon-180B-Chat is a 180B parameters causal decoder-only model built by TII based on Falcon-180B and finetuned on a mixture of Ultrachat, Platypus and Airoboros.

Sep 4th, 2023

Zephyr-7B

Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO).

Nov 22nd, 2023