In Context Learning Overview

Nowadays a lot of people talks about GPT, Claude, and a lot of different large language models. So how do we apply them to our personal use?

At the beginning it’s mainly finetuning - which we will dive in specifically in later post. Before doing in context learning, We first pre-trained our language models on a very large text dataset. The datasets at this stage is generally clean - preprocessed text from various websites like Reddit or Wikipedia. This stage is the most expensive and time-consuming step - even with GPU acceleration it’s going to cost days of training, and these days the size of language model parameters are HUGE.

Finetuning

We’ll get started with finetuning. This is to further train our model with the pretrained weights on a dataset specified for your tasks, whether it is to give instructions as a customer agent, help students with their questions as a teacher, or serve as an emotional comfort. This stage is normally less pricey and takes much less time since we are “working” on the pre-trained weights, which has already been done.

There are a lot of approaches of modifying the weights, whether it is to replace part of the weights or attaching a small piece of extra weights to the original model. The adaptor, is a typical example of adding newly trained weights. Another widely adopted finetuning is LoRA(Low-Rank Adaption), which we will discuss in details in later post.

In short, this method just decomposes the weight matrix to the smallest dimension(or rank, as used here) possible to achieve the same training result with training the full parameters. There are mainly two benefits: First, this is very flexible. The addition nature allows people to switch between different sets of LoRA weights; Second, this is less time-consuming and saves more RAM usage due to the decomposition in dimensions.

Retrieval Augmented Generation

Another widely adopted method, short for RAG, is even more cost-friendly since we are not doing anything with the model weights this time. We introduce an external source to our LLM to augment their knowledge for specific task.

Chunk?

To get this external source prepared, there is an important stage: chunking your texts. Why do we have to do that?

To explain this, we’ll have to introduce an important term - context window. This is the longest possible tokens(or characters) that LLM can takes in and process at a time. Research proved that the longer the context window, the LLM is more likely to get lost and slow with their response. Since after searching in our external source for answer we want to feed it in our LLM, it’s necessary to chunk our text.

You’ll want to do your chunking smartly and carefully, since what normally happens is that a sentence is splitted into two chunks, and the LLM get confused by the half-sentence. Later we’ll post on different “chunkers” and my experiment on them.

Back to external source

This external source can be anything: SQL database, Vector Database(more commonly used at first), Knowledge Graph, and etc. For a simple RAG pipeline, vector database is most commonly used since you can only perform vector similarity search with your query and directly take what returned.

For SQL or Knowledge Graph, you’ll have to split your workflow into chains of LLM. First, you prompt it to paraphrase the input into a search sentence. For example, in SQL it would be SELECT xxx FROM xxx. Paraphrasing query also helps a lot other than working with these two sources. We’ll come back to this in later posts.

Conclusion

Even this post on in-context learning is just a tip of the iceberg in the world of large language models. There are a lot of concepts that we will discuss later.

References

Nice video on LoRA to look at

#RAG

In Context Learning Overview

http://silviachen46.github.io/2024/06/20/beginner/

Author

Silvia Shuyu Chen

Posted on

June 20, 2024

Licensed under

page Previous