A way to evaluate context windows of LLMs


I came across this competition in Kaggle that encourages users to stress-test the context window of the a newly released LLM model has a context of 1M tokens. One idea that came to me was to take long passages of text in different languages that are easy to detect, mix them up and place random alphanumeric characters in between the text. I would then ask the LLM to detect these alphanumeric words that are out of context. I call this ‘the document prankster problem’.

For a human, such a task would be easy whether one knows these languages or not. LLMs tend to compress data when the context window exceeds the limits it can handle. This test was designed to see if LLMs can really maintain this window without losing context. My recent tests indicate mixed results and show that LLMs still have difficulty completing such tasks which are simple to a human. I’ll be posting my results in the coming days. Keep an eye on this page to see the results of my study.


Leave a Reply

Your email address will not be published. Required fields are marked *