mulangonando
- Dec 29, 2023
- 10 min read

The Power and Opportunities of Large Language Models for Africa Part I: Understanding LLMs

Updated: Jan 2

Dr. Mulang' Onando, Presenting the Keynote address at ICAIR, UniLAG, Lagos, Nigeria on 6th December 2023

About three weeks ago, I had the opportunity to deliver a Keynote talk, and a GenAI for SMEs in Africa focused panel discussion at the 3rd International Conference on AI and Robotics (ICAIR). Held on the 6th and 7th of December, in the warm, humid coastal city of Lagos, Nigeria the conference was organized and hosted by NITHUB of UniLA, under the tutelage of Dr. Victor Odumuyiwa. I had a staller opportunity to interact with professionals from different sectors including founders, business professionals, University Administration, government officials, NGO, and students. This was both a knowledge share and learning experience. Thanks to Prof. Ojo, and Dr. Odumiyiwa for the invite. Themed: 'Artificial Intelligence and Blockchain for the Sustainable Development of MSMEs,' the third iteration of the conference sought to explore the transformative potential of AI and Blockchain technologies in strengthening the sustainability of Micro, Small, and Medium-sized Enterprises in Nigeria. I enjoyed inspirational talks from numerous great speakers including: opening remarks and speeches, from Prof. Folasade Tolulope Ogunsola, (Vice-Chancellor of the University of Lagos), Prof. Ayodele Atsenuwa (Deputy Vice-Chancellor Development Services), Prof Bola Oboh (Deputy Vice-Chancellor, Academics & Research), Prof. Elijah Oyeyemi (Dean of Science), Prof. Adetunji Philips Adewole (Head of Department, Computer Science), Dr. Victor Odumuyiwa, (Director of Nithub, University of Lagos), and Mr. Kashifu Abdullahi, (Director-General of NITDA Nigeria). Followed by great presentations and keynotes from Dr. Thuweba Diwani of the Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) GmbH commission, Dr. Bayo (Olubayo) Adekanmbi PhD (This talk was truly a highlight for me, as it maps my AI conceptualisation to digested simple concepts for Business professionals) of DSNai - Data Science Nigeria, Mrs. Funke Opeke, CEO of MainOne, and Prof. Bruce W. Watson, all showcasing the innovative role of AI in empowering the business ecosystem. Prof. Adegboyega Ojo discussed the Safety and Governance Issues and the implications for Emerging AI Policies & Regulations, and Dr. Russel, along with Prof. Bruce Watson, considering the ethical implications of implementing AI.

My Keynote Talk at the 3rd International Conference on AI and Robotics, in Lagos Nigeria was titled: The Frontier of Large Language Models -LLMs or Large Multimodal Models, Opportunities and Challenges for Africa and surveyed the landscape of Large Language Models (LLMs) and the potential these models avail for sectors such as small business, Agriculture, Healthcare, Governance, and general domains.

The Address

Hello! hi this looks like a lecture Hall in my other duties I teach at the University; the Jomo Kenyatta University of Agriculture and Technology. So, I was invited to this and I thought to myself who am I going to meet here and he told me; Dr. Odumuyiwa told me that I'm going to have people who are newbies in technology or AI, government and people from business I see mostly students here so before I start maybe I just want to do a small roll call right how many people are students in this in this Gathering 95% students and students the all of you are interested in AI Mathematics and stuff like this all right or you guys don't like that how many people are business people business people I was expecting 50 awesome and maybe now the administration and people from the government would my name is Dr Mulang' Onando and I work at IBM research Africa. I research in AI, that is, "I sort of sit around, read papers from other people and once in a while, I write my own thoughts"; but the specific area where I work on is using generative models for healthcare. Right! My discussion today, Generative AI, is the biggest and newest child on the block. Generally these are very powerful models that are used to: mathematical speaking, they take samples of data (be it text, images or whatever) and they learn what we call a density space or they learn some representation of that so that they are able to then map that into new samples. Right, and in Healthcare there are so many things like COVID-19 came up recently and the problem with heathcares is that we don't know how to find drugs for example to treat new diseases righ? ...And so using AI or generative AI we can learn from samples of drugs that existed for example SARS COV and be able to draw novel molecules to be able to bootstrap our Discovery process of covid-19 drugs. So that is my space, that's not exactly the talk I'm going to talk about. So I have entitled my talk: "Large Language Models are awesome but they don't know your governor" OK?. As I go through the journey of this talk today we are going to look at how awesome LLMs are or Foundation models are and where they are limited. Why do they not know our government right ? And that's exactly the space where I want you now to think as students and as SME's how can we make them understand our governor right OK so next slide please so recently I am a computer scientist through and through and I was teaching my younger brother how to to program in Python he's not that young he's he's probably 29 but some mathematician more than a programmer so I got to code and I was coding something to do with fetching data from online but then keeping track of the time. Yeah! so that then after every 30 minutes I fetch some data and handling...the package for handling time in Python gave me a bit of trouble. And so my first thought was to go to Stackoverflow, that's where we all go, these people who claim to be programmers. So going to Stackoverflow I got some code, pasted it there and it never worked. And my brother who is learning Python, who I am meant to be teaching, told me: hey what is the trouble? and I told him OK this thing is not able to work now and then he told me, send me that code and I sent him and in the next two minutes he gave me the exact code which worked and I was like how did you do that you're the one who is learning? And he told me ChatGPT gave me the answer. ... It's not so recent, like I think three months ago and that's the first time I actually understood how powerful ChatGPT has influenced students and practitioners right! Because as as practitioners we tend to ignore it right? We tend to believe that it's just some other Natural language model but this particular output you can see on the right hand side there is code that is generated from my errors right? And I thought that is a good motivation to show you just how powerful these things have become and it's it's not getting weaker, it's not getting smaller it's just going to be bigger. So these models have been researched overtime and they're becoming more powerful and they're able to do a lot of things for example generate code for you. It's not gonna replace programmers but it's gonna help us achieve a lot of things. OK! Now that takes me to a disclaimer: some of you might be thinking that I generated this presentation with ChatGPT, the disclaimer is; I took time to curate a presentation that is gonna capture at least three groups of people one of them being students so there's a lot of mathematics also discussion here some of them government people and and professionals so something to take away for SME's and so on and so forth.

So now to my presentation, right! So as we have seen how powerful these things are there's always the question of how long will they stay put and are they going to stay here forever is it just a hype? And it is true it's a hype, some big hype. Tomorrow you'll wake up and find that oh ChatGPT is not that good OK. But you see what this hypers done is that it has been able to now bring the business people, the executives the government officials, to be able to play with something. Right now right now students can look for assignments online, government people can you know quickly prototype something, and they now understand how powerful AI is and with that for us practitioners we are happy. Because initially software developers and AI professionals spent time to just convince executives that hey! we need this particular technology to be able to help us move forward, now we don't have to spend so much time because they already understand from ChatGPT that hey! this technology Generative AI that is going to help us. So how do we overcome the hype? How do we maintain the momentum? How do we move forward? As you can see: this is Gartner cycle of technology and you can see at the top there we have the Foundation Models, that is the new name for Generative AI and it's at the top, it's almost at the peak where it's going to start now going down right! and we should expect that either it's gonna go down or if research is good enough it's going to stay at the plateau. OK! So in terms of hype, yes it's a big hype now but there's a lot of positives that are coming. Now we're going to foundation models which as I told you is the new name for a Large Language Models - Generative AI and by definition Foundation Models are a set of technologies that are able to capture representations of data: be it text, be it images, or whatever it is, and because it learns from large amounts of data, it's able to transfer that information to other tasks. For example here you have things like information extraction, image captioning, object recognition, and so on and so forth. GPT 3, which is the front face or the back end of what we call the ChatGPT is a textual Foundation Model and it can be used for example for chat, ChatGPT is one application of that, so then this foundation models help us to be able to capture these high amount of representations from the huge data and then we can adapt that into some given domain. That's what Foundation Models are all about.

A bit of background, this where the students come in or mathematicians come in. A bit of background to this, initially we had we were just we were just cool we were just doing software and we were trying to understand text and in understanding text there are so many models that came up. Some of them called TF-IDF: just counting words and understanding how they are related and and so on and so forth.

Then somebody thought of neural networks because of the power of computers and amount of data that exists online and so on. To be able to understand text, and for students you probably have understood things called Long Short Term Memory Networks which is the brother that just preceded what I'm going to talk about. What LSTM's do is they try to understand; because text comes in a sequence and the sequence can be so long, they try to represent all the sequence texts and be able to understand the whole context of data and that's what LSTM's do. But somebody thought, if the text is so long what will happen is that you will forget where you started right! So how can we represent better so that we only understand the most important thing in the text right! So if somebody is talking about the university of Lagos and they've talked about how the Vice Chancellor was represented by the deputy Vice Chancellor in the conference and the only thing they want to do or to understand is university of Lagos, all the other texts become useless right? So then researchers at Google thought that why don't we come up with an architecture that is able to understand only part of the text, in figurative language, you say "paying attention" right! so you concentrate or pay attention to only part of the sequence that is most relevant and that's the architecture by Vaswani et. al., I I don't want to go into the details because I feel that some people may be lost. But the idea is that the attention architecture is able to represent and capture and concentrate only on the part of the sentence that are very vital. You can see there, in the architecture, it takes us input Q,K,V which is query, key, and value and one of those is the target ... target output and the other one is the the target input and it tries to match that so that then part of the question or the query that is most important thing is represented.

When Vaswani released this, again these Google researchers are crazy, because they have a lot of hardware and they're very rich and they can just experiment with so many things. Somebody again at Google thought; why don't we take this attention architecture and blow it up right! So that it's able now to capture what we are calling the foundations right! capture representations and that's when BERT came. Somebody called Delvin (and co.), again from Google, came up with an architecture now that combines all these attention networks into something more powerful and then it can train so this is 2018, we are coming to 2023 right! So this is 2018 Delvin says OK; let's combine all these attention heads and let's be able to represent huge amount of historical information into textual and into vector spaces. Now from there researchers thought OK how can we get to go further right? And there are two major objectives, again for mathematicians, there are two major objectives here: one of them is able to say: let me predict the next word given the previous ones; what we call auto regressive (AR) right! Right and the other one is saying let me mask or hide or add some noise into the text so that when I'm able to unearth what was hidden when I did the masking so that was called the AutoEncoding (AE).

Again I put reference for those who are interested in going deeper into that. Those two objectives have been powerful in what we have as LLMS or Foundation Models. And finally now away from the mathematics now we finally get to GPT so GPT is following those two objectives specifically for ChatGPT is following the second [rather first] objective and what they have done is to take all the text online and train something that is able to understand all that text to mean that it's able to represent that text into vectors OK and then do calculation; if you can't get your input into mathematical numerical form then you are not able to do anything but the beauty is that it grew so fast that the world was shocked and it continues to grow we now have GPT4, and GPT4 is no longer open right? open I calls themselves open but GPT4 is no longer open so that's where we we start wondering. So people are looking for ways to come up with open initiatives: my CEO, Arvind Krishna, yesterday announced that they have a consortium that is called what's that called AI initiative [Rather The AI Alliance], that is gonna now foster forward open initiatives for AI because it's no longer open for some of us but the point with OpenAI [Rather: ChatGPT] is that the models are very huge and that trade on such a large amount of data was able to capture very succinct information about what they've learned OK.

Look out for Part 2.

mulang' Onando

The Power and Opportunities of Large Language Models for Africa Part I: Understanding LLMs

Recent Posts