If you are a software developer striving to keep up with the latest buzz about large language models, you may feel overwhelmed or confused, as I did. It seems like every day we see the release of a new open source model or the announcement of a significant new feature by a commercial model provider.
LLMs are quickly becoming an integral part of the modern software stack. However, whether you want to consume a model API offered by a provider like OpenAI or embed an open source model into your app, building LLM-powered applications entails more than just sending a prompt and waiting for a response. There are numerous elements to consider, ranging from tweaking the parameters to augmenting the prompt to moderating the response.
LLMs are stateless, meaning they don’t remember the previous messages in the conversation. It’s the developer’s responsibility to maintain the history and feed the context to the LLM. These conversations may have to be stored in a persistent database to bring back the context into a new conversation. So, adding short-term and long-term memory to LLMs is one of the key responsibilities of the developers.
The other challenge is that there is no one-size-fits-all rule for LLMs. You may have to use multiple models that are specialized for different scenarios such as sentiment analysis, classification, question answering, and summarization. Dealing with multiple LLMs is complex and requires quite a bit of plumbing.
A unified API layer for building LLM apps
LangChain is an SDK designed to simplify the integration of LLMs and applications. It solves most of the challenges that we discussed above. LangChain is similar to an ODBC or JDBC driver, which abstracts the underlying database by letting you focus on standard SQL statements. LangChain abstracts the implementation details of the underlying LLMs by exposing a simple and unified API. This API makes it easy for developers to swap in and swap out models without significant changes to the code.
LangChain appeared around the same time as ChatGPT. Harrison Chase, its creator, made the first commitment in late October 2022, just before the LLM wave hit full force. The community has been actively contributing since then, making LangChain one of the best tools for interacting with LLMs.
LangChain is a powerful framework that integrates with external tools to form an ecosystem. Let’s understand how it orchestrates the flow involved in getting the desired outcome from an LLM.
Applications need to retrieve data from external sources such as PDFs, web pages, CSVs, and relational databases to build the context for the LLM. LangChain seamlessly integrates with modules that can access and retrieve data from disparate sources.
The data retrieved from some of the external sources must be converted into vectors. This is done by passing the text to a word embedding model associated with the LLM. For example, OpenAI’s GPT-3.5 model has an associated word embeddings model that needs to be used to send the context. LangChain picks the best embedding model based on the chosen LLM, removing the guesswork in pairing the models.
The generated embeddings are stored in a vector database to perform a similarity search. LangChain makes it easy to store and retrieve vectors from various sources ranging from in-memory arrays to hosted vector databases such as Pinecone.
Large language models
LangChain supports mainstream LLMs offered by OpenAI, Cohere, and AI21 and open source LLMs available on Hugging Face. The list of supported models and API endpoints is rapidly growing.
The Model I/O module deals with the interaction with the LLM. It essentially helps in creating effective prompts, invoking the model API, and parsing the output. Prompt engineering, which is the core of generative AI, is handled well by LangChain. This module abstracts the authentication, API parameters, and endpoint exposed by LLM providers. Finally, it can parse the response sent by the model in the desired format that the application can consume.
Think of the data connection module as the ETL pipeline of your LLM application. It deals with loading external documents such as PDF or Excel files, converting them into chunks for processing them into word embeddings in batches, storing the embeddings in a vector database, and finally retrieving them through queries. As we discussed earlier, this is the most important building block of LangChain.
In many ways, interacting with LLMs is like using Unix pipelines. The output of one module is sent as an input to the other. We often must rely on the LLM to clarify and distill the response until we get the desired outcome. Chains in LangChain are designed to build efficient pipelines that leverage the building blocks and LLMs to get an expected response. A simple chain may have a prompt and an LLM, but it’s also possible to build highly complex chains that invoke the LLM multiple times, like recursion, to achieve an outcome. For example, a chain may include a prompt to summarize a document and then perform a sentiment analysis on the same.
LLMs are stateless but need context to respond accurately. LangChain’s memory module makes it easy to add both short-term and long-term memory to models. Short-term memory maintains the history of a conversation through a simple mechanism. Message history can be persisted to external sources such as Redis, representing long-term memory.
LangChain provides developers with a callback system that allows them to hook into the various stages of an LLM application. This is useful for logging, monitoring, streaming, and other tasks. It is possible to write custom callback handlers that are invoked when a specific event takes place within the pipeline. LangChain’s default callback points to stdout, which simply prints the output of every stage to the console.
Agents is by far the most powerful module of LangChain. LLMs are capable of reasoning and acting, called the ReAct prompting technique. LangChain’s agents simplify crafting ReAct prompts that use the LLM to distill the prompt into a plan of action. Agents can be thought of as dynamic chains. The basic idea behind agents is to use an LLM to select a set of actions. A sequence of actions is hard-coded in chains (in code). A language model is used as a reasoning engine in agents to determine which actions to take and in what order.
LangChain is rapidly becoming the most important component of GenAI-powered applications. Thanks to its thriving ecosystem, which is continually expanding, it can support a wide variety of building blocks. Support for open source and commercial LLMs, vector databases, data sources, and embeddings makes LangChain an indispensable tool for developers.
The objective of this article was to introduce developers to LangChain. In the next article of this series, we will use LangChain with Google’s PaLM 2 API. Stay tuned.
Copyright © 2023 IDG Communications, Inc.