The rise of the LLM

Author

Peter McBurney is professor of computer science in the Department of Informatics at King’s College London

1

unit

CPD

Studying this article and answering the related questions can count towards your verifiable CPD if you are following the unit route to CPD, and the content is relevant to your learning and development needs. One hour of learning equates to one unit of CPD.

Multiple-choice questions

As many readers will know, the past year has seen a lot of public attention paid to a new type of artificial intelligence (AI) model, called large language models (LLMs) or foundation models. LLMs originated in the automated translation of one human language into another, from English to French, say.

In translation, the context in which words appear can be crucial to assessing their meaning, so methods were developed to incorporate the context of words or phrases, or even whole paragraphs. The more examples these models were given and the larger those examples were (in terms of the number of parameters they contained), the better these models performed.

People realised that these models could be trained on the internal, proprietary data of companies

Sufficiently large models trained on very large amounts of text also turned out to be effective at answering queries of a more general nature, not just language translation. Since the public release of ChatGPT by the company OpenAI a year ago, there has been a competitive rivalry between leading tech companies to create larger and better LLMs, with Amazon, Google, IBM and Microsoft, among others, all active in the field.

Such has been the level of interest that the British government convened an international conference in November 2023 to discuss the potential long-term risks of the use (and misuse) of these models.

Training the bots

Public LLMs have been trained on information available on the web – for example, blog posts, wiki sites and social media (such as Twitter and Reddit). As this happened, people realised that similar models could be trained on specialised data. Google, for instance, is developing an LLM specifically for medical applications, trained on medical terminology and on knowledge about causes and consequences of illnesses.

People also realised that these models could be trained on the internal, proprietary data of companies and organisations. For example, training an LLM on the internal policy documents of an airline would enable the model to answer questions from customers about cancellations and refunds, or other company policies. A suitably trained model could answer the substantive part of a customer query with facts from the internal documents of the airline, while presenting the answer in appropriate conversational text derived from its earlier training on publicly available data.

How best to combine these two capabilities is something that tech companies are still learning. Doing so effectively and efficiently requires expertise in the domains of application, so major tech companies have engaged with their largest customers to jointly develop prototype applications.

No company can estimate with any accuracy the time or resources needed to create customised AI applications

Facts plus narrative

Here I will outline a simplified version of what is emerging as a common approach to this combination challenge.

First, an electronic document collection is created comprising the internal documents appropriate for a particular AI application or use-case – for example, an application to handle customer queries or summarise legal documents.

The documents in this internal database may be in many different formats – for example, as text in policy documents and case histories, as numbers and formulae in spreadsheets, as diagrams and flowcharts showing decision processes, or as audio, image or video files.

Smaller companies will soon have access to the technologies involved too

To enable these different documents to be combined with the LLM, the documents need to be converted to a format that can be read by the LLM. This process is commonly called vectorisation: converting original formats into vectors of numbers.

Once vectorised, the documents in the internal database can be queried to obtain the factual part of the response to a customer query or request. When the document or documents in the internal database relevant to the question or request have been identified, these (vectorised) documents can be combined with a request, called a prompt, made to the large language model.

Applications and hallucinations

Large language models were initially developed for automated translation, so the applications for which they are best suited are those involving text or text-like objects, such as computer code.

For text, key applications are in summarising documents and document collections, summarising transcripts of meetings and email threads, and in generating text for chatbots or for marketing copy. LLMs have also been found useful to software developers in suggesting computer code for specific functions or tasks, or for creating an initial draft of larger software programs.

All such applications, however, are prone to so-called hallucinations (the generation of text that is wrong, nonsensical or detached from reality). The model output still requires checking by an expert before deployment.

In effect, we are putting the query to the LLM while saying that the factual answer is to be found in the attached documents. It emerges that for this to work, LLMs need to be fine-tuned and/or further trained with the documents from the internal database specific to the particular use-case under development.

Prompt engineering involves giving examples of expected questions along with correct answers to the model so that questions can be answered correctly. How many examples are used depends on the application domain and the extent to which the particular use-case arose in the training of the model. Commonly, 200 or more examples of question-answer pairs are needed. Prompt engineering does not change the parameters of the original large language model.

With AI applications using LLMs across many applications, this is an exciting time to be in business

Fine-tuning

Another approach is to fine-tune the LLM, which will alter some of the parameters of the model. In this approach, the prompt is combined with documents that provide a context for the answer and/or specific instructions on how to proceed in finding and presenting the answer.

These techniques are still emerging, and the skills needed to deploy them are still scarce. As with the web developers in the mid 1990s needed after the creation of the World Wide Web, there is currently a great shortage of prompt engineers. Because the technologies are only just appearing, no company can estimate with any accuracy the time or resources needed to create customised AI applications.

Although current developments are mostly confined to large organisations and companies, smaller companies will soon have access to the technologies involved too, as the major tech companies develop the required expertise and provide customised AI application development more widely.

With AI applications using large language models likely to be deployed across many different applications, this is an exciting time to be in business.