Generative AI: 1. Ethics 2.CLIP

Motivation

In the current era, the rise of Large Language Models (LLMs) like GPT-3 or LLAMA has evoked a mix of fascination and apprehension. These advanced models showcase remarkable capabilities of generating human-like text and performing complex tasks, while also raising profound ethical questions.

The integration of ethics into AI systems faces numerous challenges. Firstly, there is the challenge of modelling reasoning about obligations and permissions. Secondly, complexities arise from the persistent conflicts within various ethical reasonings. Lastly, comprehending and assessing the consequences of actions remains an intricate undertaking for both humans and machines.^[1]

Researchers have experimented with various techniques to address these challenges. Some have turned to deontic logics ^[2] and formalisms inspired by such considerations to handle the particular nature of duty rules. Others propose AI logic-based non-monotonic formalisms ^[3] such as default logics or answer set programming, closely aligned with common-sense reasoning, to mitigate logical contradictions. Additionally, there are proposals to employ action language or causal models ^[4], providing a mathematical foundation for understanding and computing action consequences.

Thereafter, the technical hurdle lies in merging these three approaches into a unified framework—a framework that is non-monotonic, adept at managing norm conflicts, and employs causal models to evaluate action consequences. These diverse approaches adopt varying normative frameworks, encompassing utilitarianism, deontology, virtue ethics, and more. Nonetheless, philosophers note the persistent lack of precision in simulating these frameworks. Consequently, the quest for universally accepted "common approaches" within applied ethics remains elusive.^[1]

Motivated by these discussions, our project aims to delve into this multifaceted ethical landscape surrounding AI from both technical and philosophical perspectives. We want to explore how AI systems deal with ethical dilemmas in the light of these diverging ethical priorities and seek methods to align these systems more closely with human ethical values. Additionally, we aim to investigate whether and how these AI systems could maintain a form of consistency in their ethical considerations.

Project Plan and Milestones

Weekly Plan

Date	Task	Completion
Week 4	Paper reading. Existing RLHF and RLAIF exploring. Red-teaming dataset exploring.	√
Week 5	Familiarizing with Dromedary, SALMON, Llama base models.	√
Week 6	Evaluation of different base models. Choice of using Llama 2 model as our baseline.	√
Week 7	Red teaming dataset exploration. Reading about ethical theories.	√
Week 8	ETHICS dataset discovering.	√
Week 9	ETHICS dataset formatting for Llama fine-tuning and evaluation. Llama supervised model fine-tuning	√
Week 10	Evaluation of Llama model before and after fine-tuning with ETHICS dataset. Model Tuning. Mid-term Presentation & Start writing the Wikipedia page with the plan.	√
Week 11	Read about Reinforcement learning using PPO. Re-formatting deontology dataset. Creation of the preference model.	√
Week 12	Examine preference learning models and learn how they work and their applications. Start a simple reinforcement learning model setup. Run preliminary tests and evaluate results.	√
Week 13	In-depth analysis of model performance. Drafting Wikipedia pages, including outline and structure.	√
Week 14	Completing the Wikipedia page, including proofreading and ensuring technical accuracy. Write the Github page & prepare for the Final presentation	√

Milestone 1

Define Research Questions: Establish clear, focused questions to guide the project.
Literature Review: Conduct a comprehensive review of existing studies in AI ethics.
Ethical Theory Exploration: Investigate various ethical theories to ground your research in a solid theoretical framework.
Ethical Dataset Identification: Locate datasets for quantitative AI ethics evaluation, such as red teaming datasets.

Milestone 2

Refine Research Goals: Sharpen the focus and scope of the research based on initial findings.
Dataset Finalization: Select the most appropriate dataset after exploration and evaluation.
Model Selection and Fine-Tuning: Settle on the LLaMA model and fine-tune it by deploying GPU resources.
Model Evaluation: Conduct a thorough evaluation of the model, focusing on its ethical implications and performance.

Milestone 3

Develop Advanced Models: Implement Preference and Reinforcement learning models, integrating them with the fine-tuned LLaMA model.
In-Depth Analysis: Analyze the models' outcomes, assessing performance, identifying defects, and investigating specific issues like coherence and degeneration.
Documentation and Dissemination: Create a comprehensive Wikipedia page summarizing the project's findings.
Final Deliverables: Compile all project materials, including a well-documented GitHub repository.

Deliverables

Methodology

Data

For our AI model training and evaluation, we have opted to utilize the ETHICS dataset ^[5], specifically curated to assess the comprehension of fundamental shared human values. This dataset encompasses scenarios representing five core ethical theories: justice, virtue, deontology, utilitarianism, and common-sense morality.

The ETHICS dataset is structured around natural language scenarios, enabling the formulation of diverse situations encompassing interpersonal relationships and everyday events. AI models aimed at excelling within this dataset have to proficiently discern and assimilate morally significant factors emphasized by each ethical framework.

Justice theories emphasize concepts like impartiality and what individuals rightly deserve. Deontological theories pivot around our duties to others, prioritizing adherence to rules and obligations. Virtue Ethics revolves around character traits such as honesty, empathy, benevolence, or truthfulness. Utilitarianism places emphasis on actions' consequences, particularly their impact on happiness or well-being. Commonsense morality, on the other hand, evaluates the moral status of actions based on intuitions and emotional responses.

Comprising over 130,000 daily-life scenario examples categorized across these five ethical theories, the dataset includes distinct training and test sets. The data was collected from English speakers residing in the United States, Canada, and Great Britain.

Model Selection

Our aim is to find a balance between high performance and alignment with human preferences. The LLaMa model, with its advanced training, computational efficiency, open-source availability, fine-tuning capabilities, and strong performance in key benchmarks, makes it a suitable choice for our project.

The capabilities of LLMs are remarkable considering the seemingly straightforward nature of the training methodology. Auto-regressive transformers are pretrained on an extensive corpus of self-supervised data, followed by alignment with human preferences via techniques such as Reinforcement Learning with Human Feedback (RLHF). Although the training methodology is simple, high computational requirements have limited the development of LLMs to a few players.

There have been public releases of pretrained LLMs (such as BLOOM (Scao et al., 2022), LLaMa-1 (Touvron et al., 2023), and Falcon (Penedo et al., 2023)) that match the performance of closed pretrained competitors like GPT-3 (Brown et al., 2020) and Chinchilla (Hoffmann et al., 2022), but none of these models are suitable substitutes for closed “product” LLMs, such as ChatGPT, BARD, and Claude.

These closed product LLMs are heavily fine-tuned to align with human preferences, which greatly enhances their usability and safety. This step can require significant costs in compute and human annotation, and is often not transparent or easily reproducible, limiting progress within the community to advance AI alignment research. In this work, we develop and release Llama 2, a family of pretrained and fine-tuned LLMs, Llama 2 and Llama 2-Chat, at scales up to 70B parameters. On the series of helpfulness and safety benchmarks , Llama 2-Chat models generally perform better than existing open-source models. They also appear to be on par with some of the closed-source models, at least on the human evaluations.

Model Fine-Tuning

Performance Evaluation

Quality Assessment

Limitations

Credits

Course: Foundation of Digital Humanities (DH-405), EPFL

Professor: Frédéric Kaplan

Supervisor: Alexander Rusnak

Authors: Yiren Cao, Xi Lei, Cindy Tang

References

↑ ^1.0 ^1.1 Powers, Thomas M., and Jean-Gabriel Ganascia, 'The Ethics of the Ethics of AI', in Markus D. Dubber, Frank Pasquale, and Sunit Das (eds), The Oxford Handbook of Ethics of AI (2020; online edn, Oxford Academic, 9 July 2020), https://doi.org/10.1093/oxfordhb/9780190067397.013.2
↑ Horty, J. F. (2001). Agency and deontic logic. Oxford University Press.
↑ Ganascia, J. G. (2015). Non-monotonic resolution of conflicts for ethical reasoning. A Construction Manual for Robots' Ethical Systems: Requirements, Methods, Implementations, 101-118.
↑ Mueller, E. T. (2014). Commonsense reasoning: an event calculus based approach. Morgan Kaufmann.
↑ Hendrycks, D., Burns, C., Basart, S., Critch, A., Li, J., Song, D., & Steinhardt, J. (2020). Aligning ai with shared human values. arXiv preprint arXiv:2008.02275.

[Ethics_of_AI-1] 1.0 ^1.1 Powers, Thomas M., and Jean-Gabriel Ganascia, 'The Ethics of the Ethics of AI', in Markus D. Dubber, Frank Pasquale, and Sunit Das (eds), The Oxford Handbook of Ethics of AI (2020; online edn, Oxford Academic, 9 July 2020), https://doi.org/10.1093/oxfordhb/9780190067397.013.2

[2] Horty, J. F. (2001). Agency and deontic logic. Oxford University Press.

[3] Ganascia, J. G. (2015). Non-monotonic resolution of conflicts for ethical reasoning. A Construction Manual for Robots' Ethical Systems: Requirements, Methods, Implementations, 101-118.

[4] Mueller, E. T. (2014). Commonsense reasoning: an event calculus based approach. Morgan Kaufmann.

[5] Hendrycks, D., Burns, C., Basart, S., Critch, A., Li, J., Song, D., & Steinhardt, J. (2020). Aligning ai with shared human values. arXiv preprint arXiv:2008.02275.

[1]

[2]

[3]

[4]

[5]

Generative AI: 1. Ethics 2.CLIP

Contents

Motivation

Project Plan and Milestones

Weekly Plan

Milestone 1

Milestone 2

Milestone 3

Deliverables

Methodology

Data

Model Selection

Model Fine-Tuning

Performance Evaluation

Quality Assessment

Limitations

Credits

References

Navigation menu

Generative AI: 1. Ethics 2.CLIP

Motivation

Project Plan and Milestones

Weekly Plan

Milestone 1

Milestone 2

Milestone 3

Deliverables

Methodology

Data

Model Selection

Model Fine-Tuning

Performance Evaluation

Quality Assessment

Limitations

Credits

References

Navigation menu

Search