This Article is a continuation of the first part of my post on the Keynote at ICAIR 2023. In my previous post, we explored, the part of my address that introduced Large Language Models (LLMs) and Foundation Models (FMs). We discussed the historical background of Large Language Models from the basic principles of Large Short Term Memory Networks (LSTM), and the Attention mechanism, and how the introduction of architectures that utilise complex combinations of Bidirectional Multi-Attention like BERT, XLNet, GPT, and ELMO changed the landscape of Natural Language Processing (Machine Translation, Entity Recognition and Disambiguation, Relation Extraction and Linking, Information retrieval, Text Classification, Topic Modelling, etc.). We safely landed on how research has advanced these frameworks into Products that are now transforming several fields including, Healthcare and the use of Foundation Models for novel molecules and drug discovery, Human-machine communication through chatbots, and many more to come. In this short post, let us dive into the next part of the talk, that begins with a discussion of the next steps of extending LLMs into image models and, discusses limitations of LLMs.
The next question is how about images and vision right? Because text and language is now highly well represented so the next question is can we have something for image or for computer vision and there are models as well that have been going on (in this respect); one of them called the GANs - Generative Adversarial Networks but the greatest kid now on the block is what we call the Diffusion Models. The diffusion models, again what they do is to check images, and I think students and probably even professionals you've played with something called DALL.E online where you generate images of people standing in the moon or somebody riding a horse or whatever it is and that model is very powerful in that it's able to capture image representations and able to produce new images that are similar to what the image was doing and again, I've written the objective for that, again, for those who are interested: you could probably look out for that. But this is my space now this is what I work with to be able to capture molecules.... You can imagine 3D molecule, maybe this type of molecules you pass it into this efficient model. What it what it does is that it adds noise until we have white noise and then reverses the process and when it reverses the process it gets to the original space where the image was and then it draws new samples. OK. yeah we're gonna skip that, just move next. Yeah! So for example here you can see the images on the left just before the line show you the samples that were input into the model and what's on the right is examples of samples that were drawn by the model after learning the structure of the of the image, and what does this potent for us? The researcher who is in the lab trying to come up with a new drug can give us the drug structure or some sample solution proteins that they used for some drug solutions and we learn from them new structures that they can do..
So we have ascertained that these models are powerful and they are huge and they are large and all and so on, but what are some of the limitations. The limitations I'm going to do are motivated by things from ChatGPT. I asked ChatGPT a few questions about Africa and I saw the problem and these are the where this problem we have an opportunity right so there's opportunities for our researchers and practitioners to be able to go and try to follow so as as captivity last night who is the first African to receive a Nobel Prize captivity first of all gave me a big answer and I didn't include it here it says Albert Schweitzer it was the first African to receive a Nobel Prize then I was shocked because that name sounds German right and I was like let me ask again I asked again ChatGPT and now it apologised that: hey sorry for my previous answer it's actually somebody called Albert Luthuli, a South African, so this is a problem because if you're gonna trust ChatGPT and it tells you answers and apologises later! that's a problem that is called hallucination right so it's giving you it's not sure but it's confidently giving you right if I wasn't if I didn't know the name sounds German I would just have taken it as the answer right? But it's not they're not asked again the same question that different format right first African to be a Nobel laureate and now it goes back to Albert Schweitzer so the fact that Large Language Models take so much text, gives it so much information, sometimes the actual accurate answer to things is lost in that mix. OK! The next limitation will be... Would be that large language models so often are stuck in their past right? Like GPT-3, what you call ChatGPT, is a 2022 (rather 2021 Dec.) last trained model. If they are to train it again, they'll need resources, they will need all these kind of things nobody has, ... nobody has resources right? Well, OpenAI has resources, Google has resources and so on but well, for us we may want to think maybe we don't have good resources. So I asked ChatGPT: who is the governor of Rivers State in Nigeria? There's a State called Rivers State right? Right, and it told me that according to its knowledge; which is true in 2022 yeah Nyesom Wike was the governor. He is no longer the governor no? Yeah! He was ... he's out and it's (i.e. ChatGPT) now lets me down, that hey: these things change, maybe you need to confirm somewhere else. So the question is; how can we update like language models in a way that you don't have to waste resources and they're always updated, that's a gap that we have in research.
Now the final problem we have is that; these things can give you answers because they understand a lot, they have been trained on such huge information, they'll give you an answer but that answer they're not able to tell you where it came from right? For example if I asked which is the largest large language model that was created in Africa, it doesn't have any idea. Well even me I don't know, but it it clearly something to do with there hasn't been a widely recognised Large Language Model. That could be true right! There may be no Large Language Model from Africa, but can you tell me the source of that information? We cannot verify. Right! because that data is contained in the internal representation of the weights of the model. So now, with all these problems and limitations; how can we move forward in that context?
The other challenge with the Large Language Models has to do with Cost considerations. These things are so large they require such a huge amount of data and they require such specialised hardware to be able to train. For example, ChatGTPT, I mean GPT from OpenAI has 175 billion parameters - now if you know what parameters mean in statistics, these are... how great make it in a simple form... the elements of the model that actually retain the information right? So that you're able to now predict or classify the the next data point. That is very huge, will take you months to train, will take you even to curate it will take you years sometimes so the source of data for example openAI's GPT3 was trained on for example open web books and so on. But they have hidden some information about proprietary data they used to train the extra mile. Google came up with some other model called the Bard which has 137 billion a bit smaller than GPT-3 but it's still a big model and they also claim to have trained on common crawl and some news and Wikipedia and so on, but there's always some hidden hand under the table that they used to extend it to be more powerful. The last model there is from Facebook and Facebook; their model is completely opensource. It's called LLaMA ... I think I forgot to add it there, sorry, apologies for this. Yeah and the aim was to to be able to reduce the number of requirements of parameters here and there more than 7 billion between 7 billion to 65 billion. What they were doing is to train several models incrementally until the point where they're able to match the others and they claim that the 7 million parameters model is able to.. the 13 billion parameters model is able to match GPT3. So if you're looking for some inspiration on working in this space you might want to look at the LLaMA model and be able to get insights.
Some more on cost considerations: how long do these things take to run? OK fine, it takes long it takes heavy hardware to be able to run these models but now once they are trained and commissioned how long do they take to give answers. ChatGTP for question like who founded the University of Lagos it gives a very beautiful answer and I think it's correct. I don't know, you guys know, but I think it's correct ... it's correct! awesome. So it's giving me a good answer explained how it was founded, how the government was involved but it took approximately 2.5 seconds. Then I said, let me try Google right! Google is even more beautiful gives even the image of the guy who founded it but it takes around 0.5 seconds. But now, I want to introduce a talk about a third technology that is able to do this in a much shorter time, and this technology is called Knowledge Graphs. Knowledge Graphs are constructs that have been curated, data has been represented in a graph format so that then you're able to query directly the data right! One of the common knowledge graphs that are open source is Wikidata. Wikidata contains a lot of information in uuummh, what's it called... Graph format and you can query it using this kind of query what you call SPARQL query. Now GPT was able to generate for me this query and give me so that then I can answer using Wikidata, I had to change a few things. But ChatGPT is able to generate for you the query. The good thing with Knowledge Graphs, is that it's able to do it in 0.26 seconds which is faster than the other two. So now we look at the cost considerations for running the model for answering queries OpenAI's GPT3 runs on some hardware - like the most powerful hardware you'll ever think about, OK! and takes around 5 seconds 2.5 seconds or whatever. Google runs on hardware that you will never even imagine right! It's so powerful you never think about it, and it takes 0.5 seconds to answer that question but um Wikidata runs on some old hardware or weak hardware but is able to answer 0.26.
So what am I trying to say! Are Large Language Models or Foundation Models are wrong or bad to use? I didn't make such a claim. I said they're very powerful and actually we should all think on how to use large language models but ....
Comments