Large Language Model (LLM)

A large language model (LLM) is a deep learning algorithm that can perform a variety of natural language processing *NLP) tasks. Large language models use transformer models and are trained using massive datasets. This enables them to recognize, translate, predict, or generate text or other content. Many of them are structured as “chatbots” that mimic human conversation — for example, by using the first person pronoun. But these are basically “stochastic parrots”, and are hence “mind”-less. Nonetheless, they have a convincing social presence.

Large language models are also referred to as neural networks (NNs), which are computing systems inspired by the human brain. These neural networks work using a network of nodes that are layered, much like neurons. “Deep” learning refers to the number of layers involved.

In addition to teaching human languages to artificial intelligence (AI), large language models can also be trained to perform a variety of tasks like understanding protein structures, writing software code, and more. Like the human brain, large language models must be pre-trained and then fine-tuned so that they can solve text classification, question answering, document summarization, and text generation problems. Their problem-solving capabilities can be applied to fields like healthcare, finance, and entertainment where large language models serve a variety of NLP applications, such as translation, chatbots, AI assistants, and so on.

Large language models also have large numbers of parameters, which are akin to memories that the model collects as it learns from training. One can think of these parameters as the model’s knowledge bank. (source: ElasticCo)