The Ultra Alignment project contains a series of datasets, models and methods to align large language models.
UltraChat
We introduce UltraChat, a large-scale dataset designed for training AI assistants, featuring 1.5 million high-quality, diverse multi-turn dialogues without human queries. Leveraging UltraChat, we fine-tuned a LLaMA model to create UltraLM, a powerful conversational model.
UltraFeedback
We introduce UltraFeedback, a large-scale, diversified dataset enhancing Reinforcement Learning from Human Feedback (RLHF) in language models. Addressing the scarcity of diverse preference data, it combines various instructions and models for comparative data. We trained models like UltraRM, UltraLM-13B-PPO, and UltraCM using it.
UltraLM-13B
First version of UltraLM, finetuned with UltraChat dataset on Llama-13B. Ranked 1 on AlpacaEval on Jun 28.
UltraLM-13B-v2.0
Second version of UltraLM, finetuned with UltraChat dataset on Llama2-13B and aligned with RL by UltraFeedback. Ranked 1 on AlpacaEval on Nov 28.
Falcon-180b-Chat
Falcon-180B-Chat is a 180B parameters causal decoder-only model built by TII based on Falcon-180B and finetuned on a mixture of Ultrachat, Platypus and Airoboros.
Zephyr-7B
Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO).