Nedjma Ousidhoum is a PhD candidate at Hong Kong University of Science and Technology. She also serves as an AIhub ambassador and has written a number of articles for us. In this interview we talk about her PhD, her research into hate speech detection, and the importance of considering AI ethics.
I’ve been in Hong Kong for more than six years now. I came for a post-graduate internship then I stayed for a PhD. I wanted to experience living and working in Asia. I started a PhD after a few months, and I’ve been working on my current topic of text classification, multilinguality and hate speech detection since 2018. I actually started out working on something else (NLP and machine translation) but I changed supervisor and topic. I expect to graduate in a few months, early 2021.
Whenever someone asks what I would advise them, I tell them the most important thing is to find the right supervisor. The lab environment is important too; it’s not just about the topic. A PhD is a long journey and it is important to work with a nice and a competent supervisor. At the moment, I’m working on a topic I like and I’m really benefiting from my supervisors’ skills and I am learning a lot.
Our work focussed on a new multilingual multi-aspect hate speech analysis dataset that we designed. We collected the data and the annotations. The dataset is in French, English and Arabic – the reason for that choice is because those are the languages I speak and I can do the data analysis. I would love to work on other languages but at the moment I don’t have the ability to analyse. I think it’s very important to speak the language when you are constructing a dataset: that enables you to look at the data and understand it. If you can understand the data you can then understand how and why the annotators were annotating the text in a particular way especially when you tackle a subjective task like hate speech.
We also did multi-task learning on the data. The work is challenging, as I mentioned before, the annotations are very subjective. We have some ideas to improve the performance.
In the paper we also discuss how to use our annotations in order to improve hate speech detection and classification in general.
Yes, we should definitely do NLP on other languages as well. In my master’s degree I was working on Arabic because, especially back then, there wasn’t much research on Arabic. Right now, the more languages we work on the better. We could benefit from the insights that native speakers can give us. One of the challenges in NLP at the moment is around low-resource languages. It is definitely important to develop something for languages where very few resources exist.
Yes, and when I was starting out it was even harder. I really wanted to work in this field. However, it was really challenging. The data isn’t just benign data: it is real and there are people who actually think the way they write. They are directing their hate on social media, probably also in reality. After a while, and when you are working intensely on a project with a lot of data, you build up an immunity in order not to focus on the harm people writing such posts are doing, but in fact it isn’t just data so, yes, it’s harsh.
Something I saw recently was a corpus about the targets of hate speech directed towards Asian people during the current pandemic. Since the beginning of the COVID-19 pandemic, Asian people have been particularly targeted. This kind of data is awful to deal with but I think that we have to.
What I’m interested in is dealing with all aspects of hate speech in different languages (and not just the languages that I speak). I’m interested in comparing languages and finding a way to benefit from each of the data sets that are out there. I’m also looking to improve the annotations, the classifiers, the data and to provide more insights to the community about all these aspects.
There has been work on different features of tweets, for example, if they are targeting a community or just one person, or if they are offensive tweets, or hateful. I’m interested in all these characteristics of a social media post. I want to investigate the correlations and how those correlations can inform my research, in terms of the classification and machine learning. For future work, this information could be used to determine new tasks, and for bias mitigation. I think there is a lot to do.
Yes, I think that it’s important for moderation. It’s also important to think about the impact in the long term. In the long-term I think that maybe we could also benefit experts. I’m not a linguist, I’m a computer scientist. But, I am interested in how computer scientists can help linguists and vice-versa which would help with detection, classification and moderation.
There is a research group who have been analysing hate speech from white supremacists and well-established hate groups. They detected patterns and codes and it was very interesting work. In the long-term, it would be interesting to do that cross-lingually. We can also connect that to cyber-bullying. There are many related tasks.
Besides NLP, I’m very interested in ethics, fairness and accountability. I’m interested in the machine learning side and I’m also interested in getting insights from the experts. There is a new community that has been rising recently and I really like that fact that there is connection between people who are doing social science and humanities and computer scientists. Personally, I’m more interested in how computer science could help to solve practical problems we are dealing with right now and I think that it’s why I wanted to be a researcher. When you hear other specialists it gives you another perspective on the problem.
I think as a researcher it’s important to ask yourself: “why am I doing this?” Part of being an engineer is solving practical problems, and a researcher in computer science is also an engineer. In my opinion, doing research is a more complete mindset than “I’m going to do this practical thing but I’m going to forget about the perspective of what this is actually going to be in reality and that the outcomes may be a problem”.
It kind of hits hard when you think about the fact that you are building a system, and maybe, through that system, you are harming someone somewhere. What we really have to do is at least think about whether this is a possibility or not. At least, check the possible ethical issues before solving the problem itself. Even, if at some point we change our minds and say “I’m not going to work on this anymore”, because it’s not doing a good thing. Many researchers have been discussing this recently and I truly benefitted from their discussions.
I would like to be optimistic. At least now we are thinking about certain problems and we are having important discussions about the impact of our work.
More people are also actively working on AI demystification and the fact that we can’t compare AI with “human intelligence” just because you throw a bunch of data at a sophisticated system and it’s very good at a specific task. You see a lot of experiments where the results are impressive, but you wonder why that task is being done.
Hopefully we will have people discussing ethical issues in the syllabus of computer science degrees in undergraduate programmes, all over the world. Students would then have to think about the consequences of their work, not just all the cool stuff you can programme. I remember back to my undergraduate days and we were all excited about releasing a code that was going to run really fast, or do some cool stuff. I think it’s important to question why we are doing things and how they will impact other people.
We also need to be upfront about our systems and be clear that they have limitations. We can also benefit from insights of people who are studying other subjects. My mentor in Algeria used to say that we should do more humanities, which I agree with. We should move away from the binary thinking that science/technology and humanities are separate: they are complementary. Inviting someone from humanities to give a talk is as important and interesting as inviting a brilliant researcher in the field.
I think that is the case given our obsession with specialisation nowadays quite early in life. Each one of us researchers is working on a very narrow problem. Even within the field itself, we are quite compartmentalised. It’s difficult to find out about different aspects of the field without making a real effort. What’s been helping me is our team’s reading group and group meetings where students present their work. Since we are a big group that work on very different problems, it has been very interesting. That’s a good way to keep up with what’s been going on in the community. What could also be interesting would be to listen to people who don’t come from a computer science background.
AI in tweets – August and September 2020
GPT-3 in tweets
AI in tweets – June 2020
AI in tweets – May 2020: talks, reads and tutorials
Free AI classes and reads
AI in tweets – February 2020
AI in tweets – January 2020
AI in tweets – November and December 2019
Last month in tweets – October 2019
Last month in tweets – September 2019
#IJCAI last days in tweets
#IJCAI main conference in tweets – day 2
#IJCAI main conference in tweets
#IJCAI2019 in tweets – tutorials and workshops day 2
#IJCAI2019 in tweets – tutorials and workshops