Aligning to Thousands of Preferences via System Message Generalization

Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages

Seungone Kim

KAIST AI, CMU

Minjoon Seo

KAIST AI

* denotes equal contribution

Paper Code Submit Datasets & Models

janus_overview

Abstract

We propose the following:


Training on various system messages for alignment

Previous LLMs are trained with homogeneous system messages reflecting general helpfulness and harmlessness. We propose training LLMs with diverse system messages, each representing an individual’s multifaceted preferences, to generalize to unseen system messages. The model we train in this direction, Janus 7B, is adept at generating personalized responses for personalized system messages.

Role of system messages

People have diverse preferences that are nuanced in different contexts, and it is difficult to know what makes one preferred compared to the other. To reduce the ambiguity, we conceptualize a preference as a detailed textual description of a quality that a desirable response should possess from an individual’s lens. Based on this definition, we identify two critical requirements for a model to reflect the diversity of human preferences and devise a strategy for each.

Multifaceted preference dataset

data_construction

The Multifaceted-Collection is a dataset for aligning LLMs to diverse human preferences, built using a novel construction approach to make preferences multifaceted and explicit. We acquire 65k instructions from five existing datasets (Nectar, OpenHermesPreferences, UltraFeedback-binarized-clean, Chatbot Arena Conversations, Domain-Specific Preference dataset (DSP)). For each instruction, preference descriptions are augmented from general to specific, allowing multiple facets to branch out. Then, we combine preferences from various dimensions into a system message to materialize these preferences as model input. Following the system message and instruction, a gold response is generated. We use GPT-4-Turbo for preference augmentation, system message generation, and gold response generation.

Here is an interactive visualization of the hierarchical structure of example preferences. Click the elements to see how diverse preferences can be!

StyleBackground knowledgeHarmlessnessInformativenessConcisenessVividnessClarityFormatToneAdvancedBasicExpertIntermediateNoviceAccuracyMoralitySafetySensitivityTrustworthinessCreativityDepthEfficiencyPracticalityRelevanceUse clearStraight-to-the-point code examplesProvide a straightforward and brief explanationdirect language without superfluous detailMinimalistic with engaging visualsDetailed yet precise explanationsGraphicalTechnical clarity with a touch of narrativeStraightforward/analyticalstep-by-step instructionsPlayful and imaginative languageDetailedHigh-contrast explanationSimple vocabularyEducational and engagingCode-centric with inline commentsConcise and accurateEvocative titleNarrative complexityModern/creativeCompare-and-contrastDual response formatWonder-filled and adventurousStructured and pedagogicalReflective and conversationalProfessionally engagingPlayful and innovativeFamiliar with python and nba terminologyFamiliarity with unix-like systemsHistorical comparisons and contrastKnowledge in optimization algorithmsIncludes mathematical conceptsPresumes no prior knowledge about comic stripsInclude basic concepts of database designAware of common sense but not in-depth knowledgeBasic understanding of web developmentContextual understanding of historical figuresFamiliar with basic organic chemistryFamiliarity with quantum mechanics principlesMultidisciplinary integrationIntermediate php knowledgeAware of the complexity of religionsFamiliar with basic java syntax but new to algorithmsFamiliar with basic programming structuresFamiliarity with basic sqlFamiliar with educational theories but seeking practical applicationsBasic understanding of rust programming and memory managementFamiliar with basic hotel amenities but unfamiliar with bostonUnderstandable to all viewersSimplify coding conceptsMinimal scientific understandingBasic principles of assembly languageFaithful representation of the original textUp-to-date and verifiedProofread for logical and syntactical correctnessLegally accurate and fairArithmetically cautiousPromotes positive self-carePromoting ethical use of mathematical modelingRespectful consideration of wildlifePromote inclusiveness and accessibilityAvoidance of negative implicationsEmphasize script security practicesEnsures best programming practicesPromotes non-violenceAvoiding potential misinformationAppropriate for weight lossRespectfully craftedRespect for diverse intellectual levelsNon-fatiguingCulturally considerate and non-judgmentalEmpathetic and non-judgmental tone towards skin conditionsTested and verified solutionsReferences to latest researchBalanced viewAvoidance of data lossCautious explanation of coding practicesEnriched with triviaInnovative integration of solutionsInnovative example-driven elucidationImaginative ingredient combinationsApplication of theoretical conceptsProvide real-world examplesIntegrative analysis techniquesInclude a step-by-step explanation of the permutation calculationComprehensive detailing of qualifications and responsibilitiesDetailed explanation with algorithmic complexityHighly practical solutionsFocus on critical pointsFocused and relevant recommendationsHighlight efficient practicesQuick and straightforwardExamples drivenOffer prevention techniques and management strategiesHistorical vs. current technology comparisonsApplication-readyConnection to health and performanceHighly relevant scientific explanationsFamily-oriented and holiday-specificHighlight real-world applicationsSpecific to the serving industryPersonalized examples

Janus models

Using Mistral-7B-v0.2 as its base model, we train Janus models on Multifaceted-Collection using instruction tuning and preference optimization methods like DPO and ORPO. Visit our HuggingFace collection for the complete list of resources.

Performance

Multifacetedness

human_comparison

multifacetedness_benchmarks

On benchmarks containing instructions paired with synthetic system messages and reference answers, which we validate through human annotation, human evaluators confirm that Janus 7B outperforms Mistral 7B Instruct v0.2 and GPT models. When using LLMs as evaluators, Janus models consistently surpass other models too.

Helpfulness

helpfulness_benchmarks

On benchmarks that evaluate general response helpfulness, Janus 7B excels relative to other models. This suggests that system message generalization not only supports the creation of personalizable LLMs but also acts as an effective method for improving alignment with what humans generally perceives as helpful.

Harmlessness

harmlessness_benchmark

When evaluated on RealToxicityPrompts, Janus 7B shows significantly lower toxicity while achieving high fluency and diversity. These findings underscore the effectiveness of training an LLM with diverse system messages to balance diversity, helpfulness, and safety, making it robust and versatile.


Examples

example_1 example_2

Bibtex

If you find our work useful in your work, please consider citing our paper:

@article{lee2024aligning,
  title={Aligning to Thousands of Preferences via System Message Generalization},
  author={Lee, Seongyun and Park, Sue Hyun and Kim, Seungone and Seo, Minjoon},
  journal={arXiv preprint arXiv:2405.17977},
  year={2024}
}

Logo of KAIST Logo of LKLab Logo of CMU