New voices in AI: machine learning, causality, and founding Lanfrica with Chris Emezue

Technical University of Munich | Lanfrica

by Joe Daly

06 July 2022

share this:

Welcome to episode 7 of New voices in AI

Chris Emezue shares his journey into AI, his work in machine learning, causality, and founding Lanfrica.

You can see more of his work on his Twitter @ChrisEmezue and on LinkedIn.

You can also find out more about Lanfrica here and hear more from the founders here.

All episodes of New voices in AI are available here

The music used is ‘Wholesome’ by Kevin MacLeod, Licensed under Creative Commons

transcript

Daly: Hello and welcome to new voices in AI, the series from AI hub where we celebrate the voices of masters and PhD students, early career researchers, and those with a new perspective on AI. I am Joe Daly engagement manager for AI hub, and this week I am talking to Chris Emezue about some of his work. And without further ado, let’s begin.

—

Daly: So first of all, thank you so much for joining us today if you could – .

Emezue: Thank you for having me.

Daly: If you could just tell us and who you are, where you are, what you’re kind of working on very briefly.

Emezue: Awesome, I am Chris Emezue, I’m currently based in the Technical University of Munich, Germany. I’m doing my masters there. I’m working on machine learning, natural language processing, causality around that areas of interest.

Daly: Nice, and I mean, we’ll get back into that a little bit more, but do you have any other fun fun facts about yourself that aren’t directly AI related?

Emezue: Well uh a non AI thing I do is to play badminton, I love playing badminton a lot, so it’s not football as many people would often go for, but it’s badminton I love that game so much.

Daly: Haha, it’s a great sport it’s good, I imagine quite I don’t know tactical in some ways. And so how did you kind of get into AI?

Emezue: It’s very interesting, so I didn’t originally, you know, from a young age set foot on AI, I started with trying to find myself. So I remember in high school during the final years of high school I was trying to know where I fit, where to go, what to study for my bachelors, and it was quite hard because I had so many interests and talents and it was hard to choose one. But I did love two things. I loved the idea of solving problems and thinking about them abstractly – that’s mathematics. And I also loved computer and working with computer, so I got a wonderful advice for my mom to study mathematics because from there it would be easy for me to go into any field, any scientific field that I want in the future. So I did that. I got the scholarship and doing my to do my undergraduate mathematics at Russia so I’m studying in Russia mathematics. I realized that I don’t just want to do a theory and just study the theoretical things of proving laws, I want to apply them to real life and see practical implementations or products being developed from this and so I got into AI reading Francois Chollet ‘Deep learning with Python.’ That was the first time I saw that there’s a thing called deep learning and you can write a bunch of programs and voila, your model is training, it’s doing something called learning and there’s learning rate going down and you can recognize images and it was so fascinating and finally, I saw like this mathematics that I’m studying is actually being used for something you know very powerful. It’s not just teaching mathematics at secondary schools or universities, and that’s how my interest started.

Daly: Oh, that’s really cool, it’s kind of amazing how it goes from just like one thing and kind of spirals out from just ah.

Emezue: Yeah true.

Daly: And you kind of said a little bit there about kind of discovering deep learning and like how we kind of applied this maths and it can do things I guess is that part of what kind of excites you about AI? Or is there something in particular that you find exciting?

Emezue: Uhm, the major thing that excites me about AI is simply, you have a bunch of functions and equations and back then at secondary school and university you have, you’re writing and proving theorems or you don’t understand why you’re proving this guy’s theorem, but in AI you can really see how these numbers and functions like play in the real world. You know you have a function approximation and suddenly you have a model that can predict handwritten digits. It’s just so amazing and another thing that excites me about AI is you don’t have to program – like write the code explicitly for what you want, right? You you just give it a bunch of data and the model can kind of sort of like write its own functions to get the output that you desire so those are the two things that really excite me by quite fascinating.

Daly: Umm, saying about being at school like I think we’ve all heard that thing of like “why am I bothering learning this maths? What is it ever going to be used for?” and turns out it looks like this.

Emezue: Yes, true. In in fun fact in during undergraduate I wasn’t quite serious with probability, but now I really I cherish the my probability lecturer that took time to really explain the details and I really, you know, I kind of I wish that I really took probability very seriously because that’s like kind of the whole basis of most of the things in artificial intelligence and machine learning, yeah. So yeah, math is very important.

Daly: Yeah, don’t underestimate the importance of maths. And what do you think might be some of the biggest challenges in AI in the next 5 to 10 years?

Emezue: So right now we’re at a world where a machine and deep learning, our models keep getting deeper and deeper. We have so much computer power, so much data, although we really don’t have enough data, but right now there’s this idea that if you just get more and more computer power and then your models get bigger and bigger then you can solve most of the problems, but there have been a lot of when as research and efforts that show that that’s not the future or may not be the only future of AI. So one thing that very very interesting is a thing called out of distribution generalization in natural language processing I think it’s called something like few shot learning, or zero shot learning.
Basically, the idea is a model being able to generalize to data, or a scenario that it has not seen before, so this is something that unfortunately it’s not solved in AI right now and I feel. This is the main, if not one of the major challenges in the next 5-10 years of AI.

Daly: It’s kind of like generalizability without it getting so out of hand with giant models.

Emezue: Yeah, exactly. Yeah true.

Daly: And kind of this touches very much on kind of the this like this big challenge, do you think that’s one of the biggest challenges? Or do you think there’s any other big challenges and or I guess opportunities in AI?

Emezue: Uhm, so there is a generalized generalization. Uhm, I would see it as a subset of generalization, is training a model or building a model with little to no data. So there are some scenarios, or some places where you cannot get enough data for something. So for example, the work I’m doing with African languages you have 2000 African languages, but many of them don’t have up to, I’ll say 5000 sentences. Some don’t even are not even – they’re oral based, so you don’t have enough data for these languages and you cannot just use large pre trained models or because these models require huge enormous amounts of data so working with low data like we call it low resource languages, it’s a very uhm huge opportunity as.
There’s also the part of like implementing deployment of these models to the real world, because in research you can make a lot of assumptions assumptions on on on compute power on a lot of things, but when you come to the real world and this stuff gets really tricky because there’s things to contend with, so that’s another very interesting area.

Daly: Yeah, the real world is kind of we have these lovely idealized models and then as soon as it hits the real world it’s all kinds of messy.

Emezue: It’s a different story, yeah?
Daly: And so you can touch a bit on your research there come, could you tell us a little bit more about kind of what you’re working on at the moment and kind of some of the implications for it?
Emezue: Cool, uhm I am working on a number of things. First on my list is what we call Africa NLP. So that’s natural language processing with a focus on African languages. Because like I mentioned, they are low resource.

Emezue: They also have morphological and very interesting linguistic properties that English doesn’t have or high resource languages don’t have. So I work with amazing researchers at Masakane and around the world on research that’s connected with African languages. Still on that, with African languages I am founder of an effort called Lanfrica which is basically showcasing and making these African language resources discoverable, so the papers the dataset and projects on African language resources, making them, creating a central hub so that anyone looking for any resource for any African language can easily find it. So that’s that’s the NLP world. Now I recently – not recently about a year ago I got very interested in a thing called ‘Causality and structured learning’. So that was during my internship at Miller, which I’m still I’m doing right now as a visiting researcher to them. So I am very passionate about causality structured learning, which is basically a way to solve the problem of the out of distribution generalization by training these models to do more than just learned the correlation between data. That’s how we do it.

Daly: Yeah, there’s like. I mean, there’s, there’s a lot to cover in there. Yeah, the discoverability of the like low resource language stuff is so important to kind of make sure that kind of everyone kind of benefit from all these advances in AI. I’m sure that’s very exciting, but is there anything particular that kind of excites you about these kind of areas of work?

Emezue: Uhm, so in the area of Africa NLP, I’m very excited about the work I’m doing at Lanfrica because it has proven to be very, very useful, so I’ve had a lot of very positive feedback on how it’s easy now to find resources. It also opened some other things areas to work on you know it’s about the idea of finding information in African language, like relating to African languages and just making this way of finding information very easy. So I’m very passionate about the work I’m doing with Lanfrica and the area of causality I’m very passionate about that work, so a structure learn it’s a pretty new field for me because I just got into it, but I’m very excited about the areas that it opens up because causality or structure learning it’s it cuts across different fields it’s used a lot in drug discovery. You could also find it in the natural processing representation, so it really cuts across different fields, and I’m quite fascinated about that.

Daly: Yeah, that’s I mean yeah, figure it out exactly what’s kind of going on like it’s kind of it’s quite cool to dig down into that, and what do you think will be some of the biggest kind of impacts of your work for kind of people and kind of the everyday.

Emezue: Yeah, so for African languages. I imagine a world where, when you talk, when you call an African language, like you say ‘Igbo’ or you say ‘fun’, people are not gonna look at you like what’s that?
So I imagine a world where African languages get the same level of discoverability of attention as high resource languages that think of English, French, German. And discoverability and attention in terms of the research. So one thing I say a lot is I really want a world where I can talk to Google or Siri in my native language Native African language and he or he can reply to me and get things done.
So that the world I imagine, and I feel the work we’re doing at Masakane, the work that all researchers are doing to relate into an African language which is from works on datasets, creation to research and model implementation.
And then to Lanfrica all these works and then to other organizations that I don’t know of all these works, you know, help push this dream of representing African languages. And then to compliment it, the work I’m doing on structure learning, which has to do with out of distribution modeling would really help take AI and machine learning to the next stage, which is machine learning when there is very little data available so you don’t have to rely on so much data. So they kind of complements each other.

Daly: Yeah, it is that kind of lovely kind of synergy of kind of one really does kind of help the other. And it’s such important work to kind of make sure I’ve kind of said this already, but like yeah, making sure that everyone can kind of benefit from AI and kind of it’s not just localized to like the high resource languages like, it’s yeah, it’s so so important that everyone can benefit and make them yeah, make the most of AI. And I guess this kind of almost links a little bit, so with a question from the previous and new voice and AI, Oumaima Hajri, whose kind of work is a lot about kind of ethical AI and responsible AI. And so I’ve been feeling this is actually kind of almost links a little bit, but so her question was about how do you kind of engage with responsible and ethical AI in research and kind of why do you think that could be important?

Emezue: That’s a very good question, that’s why I like this particular question at the field of ethical AI, because in with the current trend of AI, which is just huge models, deep models and there’s the hype on the performance from very few people, ask questions like this, what data did they use? The data owners are they, you know, do they have the right, you know? So these especially when big tech comes in and shows all these amazing things, and people are asking these questions. So for example, the work I’m doing, the African languages and at Masakane a very important topic we always touch on is data governance and the idea is given the data owners or the people who contributed to this data given them the own ownership of governance of how their data is used so that they don’t make this data and dump it somewhere in the name of public domain and then one big tech swoops in, text data, trains a model, releases a product based on the model, and they have to pay for it. So this is a trend that’s happening. This is the trend that will keep happening and I think things like ethical AI and for me what I think is a subset of ethical AI, which is data governance that’s something I’m very passionate about. It’s about giving the community of the community, the language community, giving them governance of their data. So we’ve had a lot of discussions on this at Masakane, there’s actually Kate Stevens from surface data collective who is working on that so trying to build more trying to build platforms that are more pro data governance. The work at Lanfrica, which is trying to make these African language resources visible, is also pushing in that area, because when these works get when these resources get attention, then because the resources get attention but they are governed by the license of the data set owners and so they get attention so that people can know about them and get to use them. So I in summary I am really passionate about ethical AI, especially data governance and this is what I have been focused on quite a lot with Masakane with African languages.

Daly: Yeah, the the the point about sort of creating these datasets and not like having to then pay back the thing that you kind of made, I imagine that’s that’s quite a big challenge.
Yeah, unfortunately it’s not something that can be done immediately, I think it takes planning, which is one thing that Masakane and a group of other people. I think the African Union is also in it, and it takes planning, it takes pushing, you know it takes showing the importance of this. But I also like that some small communities are also taking decisive action, so there has been talks about the Maori language community who refuse to put their works open source just because of this. And so there have been a few communities here and there taking few actions, and Masakane too, with Africa NLP workshop and also some workshops trying to promote data governance, they’re really good efforts in my opinion.

Daly: And so it definitely sounds like you said about communities, it really is kind of a community effort of all these people kind of working together to achieve the best possible kind of outcomes for everything. OK, and and penultimate question; what is your question for the next next new voice in AI?

Daly: in terms of what would your question be for the next new voice in AI?

Emezue: OK, so with the current trend of AI, so you have, it’s like every month you have one amazing, updates maybe either a very big model with some really amazing results. For example, Dalle 2 came out recently, you have PaLM, very huge language models coming out with amazing. One might think that we’re really getting close to this artificial general intelligence where you have super smart machines doing really amazing things, Sometimes that notion is met with worry and fear, ’cause when you think of Terminator. Sometimes it’s met with excitement, so there are cool things like Jarvis and Iron Man and one particular thing is, there’s a Netflix movie about people who are speaking in their original native language and you can understand them. So you have like a device or something and you can speak fluently in your own language. So these are cool things in the future with artificial intelligence and my question is: do you believe in the future? So I’m looking at the let’s say 20 years time in the future, do you think we at the current pace we’re going we can get to artificial general intelligence? I’m not talking of Terminator, OK, don’t go there. Do you think we can get there? And what’s your take on that? What do you think about that? That’s my question.

Daly: Everyone has such excellent questions and this, I’m gonna as always be very intrigued to see what people think on that.

Emezue: Yeah I would. Also like to see the next person, hear the next person’s reply from there.

Daly: Yeah, I think it can be like general artificial intelligence can sometimes be, I wouldn’t say controversial, but it is.

Emezue: It is very, very controversial. Yeah, because yeah, yeah, some think we’re not even near there, some go, wow, we’re really making great progress so.

Daly: Yeah, intriguing to see to see what they say. Brilliant, and very finally, where can we find out more about some of your work? Where can we follow it online?

Emezue: I’m not really a fan of social media, but I do, I’m on Twitter so I do post a lot about my work on Twitter, so there’s a Twitter for me. There’s a twitter for Lanfrica. I also use LinkedIn so these are places that one can find out about my work.

Daly: Brilliant and as always we will have the links to on Aihub. Brilliant, brilliant answers as always, and it’s been really lovely to chat. Thank you so much for joining us today.

Emezue: Thank you so much for having me. I’m very grateful.

–

Daly: And finally I would like to thank you for joining us today. If you would like to know anything about the series or catch up on previous episodes do join us on AIhub.org, and until next time goodbye for now.

transcript