In 2023, Aylin Caliskan was recognized as one of the 100 Brilliant Women in AI Ethics. At this year’s International Joint Conference on Artificial Intelligence (IJCAI 2023) she gave an IJCAI Early Career Spotlight talk about her work. I met with Aylin at the conference and chatted to her about AI ethics. We spoke about bias in generative AI tools and the associated research and societal challenges.
Andrea Rafai: We’ve seen generative AI tools become mainstream recently. What are the implications of this, and could you talk a bit about bias in this context?
Aylin Caliskan: Certainly, generative AI tools (such as Stable Diffusion, DALL-E, ChatGPT) have become prevalent over the last couple of years, particularly because neither programming nor machine learning expertise is required to use them; language prompts are sufficient. This makes it simpler for middle school pupils and those who are not tech-savvy to use these tools. Using these models appears to make life simpler in some way, for example, in information retrieval or image generation. However, it is not clear how this is impacting the individuals that are using these models, because it is known that they perpetuate representational harms and these lead to slow, gradual, long-term change that ends up shaping human cognition and society. For example, when instructed to translate the gender-neutral Turkish sentences “O bir doktor. O bir hemşire” to English, ChatGPT’s outcome is biased: “He is a doctor. She is a nurse.” Being exposed to, or processing such content, forms a view, a perception of the issue at hand. Biased association of who pertains to which function in society is an example of representational harm. However, we do not fully comprehend the impact generative AI models are having on a global and individual scale. Also, when these models are applied in contexts that contain signals from underrepresented groups, the generated data is not as accurate or of as high quality compared to outcomes associated with historically advantaged social groups. This causes a different type of harm and we don’t necessarily have effective methods to address these issues at the moment, therefore public awareness is helpful. There is a need for regulations and technical guarantees with these models so that, prior to deployment, they can demonstrate that they do not cause as much damage or harm.
Andrea: What do you believe is the best approach to strike a compromise between removing negative biases and preserving the valuable elements of the AI?
Aylin: In my opinion, or scientifically, in order to manage these biases, the first thing we need to do is detect, measure, and evaluate them. In light of the fact that this is a socio-technical problem, this is the necessary initial step towards taking any action that could be technical, policy-based, or related to elevating public awareness and addressing the issue. We require holistic, all-encompassing approaches that combine technical interventions with the curation of high-quality data sets that are suitable for the context in which they will be used. We need to evaluate the data sets and then evaluate the models then try to mitigate bias at the model level. Technical mitigations at the model level might be able to somehow balance the representation of social groups, although it is not trivial. Intervention at the output or decision-making level with fairness notions can help, but there are so many fairness notions, therefore there is a necessity to figure out ways to understand how bias corresponds to fairness and in which setting it needs what kind of fairness notion. These models are being used for all kinds of purposes and they perpetuate representational harms as well as allocative harms based on the context, thus tailored strategies to manage bias is demanded.
Andrea: It is appropriate to question if AI has the ability to be totally impartial or whether there will always be some amount of bias that must be regulated?
Aylin: We will continue to have bias in all of these systems, because society is biased and these are sociotechnical systems. It is necessary to raise public awareness and develop regulations, policies and laws to address these issues. With respect to representational harms, for example, in generative AI, bias manifests directly at the output level, which calls for representational interventions for the model. Groups should be balanced and outside of that, public awareness should be raised so that people know that these models are biased and that their use amplifies bias, as we have shown. In this particular domain, we have a great deal of unanswered questions. Unless society is perfectly impartial, everything will be biased. We developed the Word Embedding Association Test (WEAT) as a way to examine the associations in word embeddings between concepts. This test detects and quantifies implicit associations, which are not necessarily linked to social groups. They can be related to seemingly innocuous concepts, such as the perception of flowers versus insects, or musical instruments versus weapons of mass destruction. We tend to perceive instruments as more pleasant than weapons, and these associations can be measured. However, when it comes to social groups, biased associations can become problematic depending on the context and, given the differences among social groups, it seems theoretically and practically impossible to have a data set that is not biased, ergo we will always have to manage this somehow. This is a novel area that we do not fully comprehend. What should be managed and when? As we gain a deeper understanding of biases, i.e. their impact on humans, society, decision-making, the information sphere, and AI, we will be able to devise more effective standards for dealing with them, which takes time. I’ve been working on this topic since 2015 and I must say we haven’t made much headway.
Andrea: Can anyone become an expert in AI, or join the field? Perhaps specialists from various professions, such as psychology, law, finance – could this help to mitigate these biases?
Aylin: Yes – we need multidisciplinary approaches and perspectives. I have collaborated with psychologists, who introduced concepts such as implicit cognition, social cognition and bias, and who have been investigating these concepts scientifically for decades, using principled, grounded methodologies, etc. For instance, sociology is pertinent. Law and policy are pertinent. Philosophy is relevant, as are cognitive science theories such as theory of mind or many other aspects of rights, justice, and ethics, as well as linguistics, computer science, and information science. All disciplines are interested on this topic, because the widespread use of artificial intelligence, in particular generative AI, right now means everyone needs to comprehend what is going on. When the systems are employed for decision making, they are used in a wide variety of settings, which means that everyone has the opportunity to have their voice heard about this matter.
Andrea: What would your advice be for non-AI researchers who are interested in getting into the field of AI?
Aylin: I believe that everyone should follow their passion and, if they’re interested in AI, I’m confident that they’ll be able to figure out which part they want to work on, what strengths they have. Any knowledge gaps they might have, they should be able to catch up, because now we have so many resources for programming for machine learning and AI. PhD students, in particular, focus primarily on research and learning to learn independently. Therefore, if individuals are interested in these topics, they are free to pursue them. If they are truly motivated to contribute in a particular area, there is nothing they cannot accomplish. There are so many unanswered research questions in this area, spanning numerous disciplines and topics, that we need more contributions.
Andrea: How do you approach teaching AI ethics to students?
Aylin: I’ve been teaching AI ethics for close to five years at this point. I teach undergraduate and graduate machine learning and data science courses as well as scaling, applications, and ethics in AI, but we incorporate ethics into all machine learning and data science courses. I primarily focus on natural language processing and vision-language, with a focus on generative AI. In some cases, ethics consisted of just one or two lectures covering fairness, bias, explainability, accountability, and transparency. They were separate lectures, but now I have a much more holistic approach where ethics is incorporated into every step of the artificial intelligence pipeline. For example, this pipeline might look like this:
design-> data collection-> model training-> output generation-> fine tuning -> output analysis
Ethical implications at each step are complex, because decisions are made at every step. When there is transparency about certain decisions at each step, students get a better understanding of how their design choices, their processes might impact society as well as the machine learning model. Because, for example, when something is biased, it doesn’t work well on certain social groups. Accordingly, nowadays we have been focusing on the typical traditional machine learning algorithms as well as language models, generative AI etc. We discuss the acquisition of data sets because data sets contain the strongest signals of bias. In lectures these are covered and then in an assignment, students start with projects that are related to looking at, for example, generative AI or training neural networks and as they do that, they analyze the ethical implications with respect to social groups and other things, as it is not just about bias. We have sustainability issues, transparency, accountability, interpretability and the ability to explain. Safety is now another concern of ours, since it is multidimensional.
Aylin Caliskan is an Assistant Professor at the University of Washington Information School and an Adjunct Assistant Professor at the Paul G. Allen School of Computer Science and Engineering. Aylin’s research interests lie in AI ethics, AI bias, computer vision, natural language processing, and machine learning. To investigate the reasoning behind AI representations and decisions, she develops evaluation methods and transparency enhancing approaches that detect, quantify, and characterize human-like associations and biases learned by machines. She studies how machines that automatically learn implicit associations impact humans and society. As AI is co-evolving with society, her goal is to ensure that AI is developed and deployed responsibly, with consideration given to societal implications.