Ethical Guidance of LLMs

From FDHwiki
Revision as of 14:32, 9 December 2023 by Arundhati.balasubramaniam (talk | contribs) (Created page with "== Abstract == Constitutional AI is a framework for creating artificial systems that can align with human values and preferences, without violating ethical principles. However, most existing methods for constitutional AI rely on human intervention, which can be costly, biased, and inconsistent. In this exploratory project, we replicate and extend the constitutional AI pipeline proposed by Anthropic, using Meta's Llama 2, a large language model with 7 billion parameters....")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Abstract

Constitutional AI is a framework for creating artificial systems that can align with human values and preferences, without violating ethical principles. However, most existing methods for constitutional AI rely on human intervention, which can be costly, biased, and inconsistent. In this exploratory project, we replicate and extend the constitutional AI pipeline proposed by Anthropic, using Meta's Llama 2, a large language model with 7 billion parameters. We fine-tune a quantised Llama 2 on a set of ethical principles and corresponding unethical principles, using a critique-revision loop in supervised learning. The critique-revision loop involves generating answers to ethical dilemmas which are used to finetune the model. We then use a generated dataset of ideal answers to generate a preference dataset to train our reward model. We then introduce a reinforcement learning model based on the policy generated by the preference model, which is trained using RLAIF (Reinforcement Learning from AI Feedback). RLAIF leverages the feedback from Llama 2 to improve its own behavior and alignment with human values. We explore the ethical spectrum with regards to LLMs by inverting the values and measuring the impact on the outputs.

[ADD A SENTENCE ABOUT RESULTS]


Introduction

“Success in creating AI would be the biggest event in human history. Unfortunately, it might also be the last, unless we learn how to avoid the risks.” - Stephen Hawking

Large language models (LLMs) pose ethical challenges and risks for human society and values. How can we align them with human interests and norms? How can we prevent or mitigate their misuse, bias, manipulation, or deception? How can we foster trust, accountability, and transparency in their development and deployment?

In this project, we decided to test the limits of LLMs across the ethical spectrum. We replicate the pipeline in Anthropic's 'Unconstitutional AI' paper and finetune a Llama2 model with a set of pre-defined values. We then generate and evaluate texts on various ethical dilemmas, and compare them to each other. Our goal is to explore the feasibility and desirability of embedding ethical values into LLMs, and to identify the benefits and challenges of doing so. By fine-tuning this model, we aim to explore exactly what makes AI ethicists uncomfortable - an Unconstitutional AI.

Project Plan and Milestones

Methodology

Motivation and Deliverables

Results

Conclusion

Acknowledgments

Bibliography