Preventing errors by ensuring prompts do not exceed an LLM's context window is a critical task. Text truncation is the most direct way to achieve this, involving shortening content to fit a specific token limit. While simple, this technique is a fundamental part of managing LLM inputs, especially when dealing with long documents or extensive conversation histories.The main trade-off with truncation is information loss. By cutting off parts of your text, you risk removing important context that the model needs. Therefore, choosing the right truncation strategy is significant for maintaining the quality of your application's outputs.Preserving the Beginning of TextThe most common truncation strategy is to keep the beginning of the text and cut off the end. This is often the default because introductions, abstracts, and opening paragraphs typically contain the most important, high-level information.The truncate_to_token_limit function provides a straightforward way to implement this. You specify the text, the maximum number of tokens, and the tokenizer you're using.from kerb.tokenizer import truncate_to_token_limit, count_tokens, Tokenizer long_text = ( "The quick brown fox jumps over the lazy dog. " "This is a long piece of text that needs to be truncated to fit within " "a specific token limit. We want to preserve the beginning of the text " "because it usually contains the most important information in many contexts." ) original_tokens = count_tokens(long_text, tokenizer=Tokenizer.CL100K_BASE) print(f"Original tokens: {original_tokens}") # Truncate to a maximum of 20 tokens truncated_text = truncate_to_token_limit( long_text, max_tokens=20, tokenizer=Tokenizer.CL100K_BASE ) truncated_tokens = count_tokens(truncated_text, tokenizer=Tokenizer.CL100K_BASE) print(f"Truncated text: '{truncated_text}'") print(f"Truncated tokens: {truncated_tokens}")As you can see, the function shortens the text to meet the max_tokens limit, adding an ellipsis ... by default to indicate that content has been removed. The resulting text is guaranteed to have a token count less than or equal to the specified limit.Preserving the End of TextSometimes, the most critical information is at the end of a document. Consider conversation logs, where the most recent messages are most relevant, or error logs, where the final lines often contain the root cause of a problem. In these cases, you'll want to truncate from the beginning of the text, preserving the end.You can achieve this by setting the preserve_end parameter to True.log_entry = ( "2024-10-15 14:30:22 INFO Processing started for batch_id=12345 " "with 150 items. System load: 45%. Memory usage: 2.3GB. " "Previous batches completed successfully. " "ERROR: Failed to process item_id=67890 due to invalid format. " "Status: FAILED. Error code: E404." ) original_tokens = count_tokens(log_entry, tokenizer=Tokenizer.CL100K_BASE) print(f"Original log tokens: {original_tokens}") # Truncate to 25 tokens, preserving the end truncated_log = truncate_to_token_limit( log_entry, max_tokens=25, tokenizer=Tokenizer.CL100K_BASE, preserve_end=True ) truncated_tokens = count_tokens(truncated_log, tokenizer=Tokenizer.CL100K_BASE) print(f"Truncated log: '{truncated_log}'") print(f"Truncated tokens: {truncated_tokens}")This approach correctly preserves the critical error message at the end of the log, providing the model with the most important context, while a default truncation would have lost it entirely.Customizing the Truncation IndicatorThe default ellipsis ... is a clear signal of truncation, but you can customize it for different needs. For example, in a user-facing application, a more descriptive indicator like [...content truncated...] might be better. In a code summarization tool, you might use a comment like # ... rest of code.The ellipsis parameter lets you define a custom string to use as the truncation marker.documentation = ( "This function takes a list of integers as input and returns the sum. " "The implementation uses a simple loop to iterate through all elements. " "Time complexity is O(n) where n is the length of the input list." ) # Truncate with a custom indicator truncated_docs = truncate_to_token_limit( documentation, max_tokens=15, tokenizer=Tokenizer.CL100K_BASE, ellipsis=" [truncated for brevity]" ) print(f"Original text: '{documentation}'") print(f"Custom ellipsis: '{truncated_docs}'")Keep in mind that the ellipsis string itself consumes tokens, and the function accounts for this to ensure the final output respects the max_tokens limit.Limitations of Simple TruncationWhile effective and easy to implement, simple truncation has a major drawback: it's indiscriminate. It doesn't understand the content it's removing. Important information located in the middle of a document will always be lost, regardless of whether you preserve the beginning or the end.For example, if you have a document with an introduction, a critical finding in the middle, and a conclusion, simple truncation will force you to choose between the introduction and the conclusion, while the finding is always discarded.More advanced techniques can address this. For instance, some systems implement a "middle-out" truncation that preserves both the start and end of a document while removing content from the middle. Another approach is to use an LLM to summarize the text, which is a form of intelligent, context-aware truncation. These methods are more complex but can yield better results when dealing with structured documents. In the following chapters on data preparation and retrieval, we will look at chunking, which provides another way to manage large texts by breaking them into smaller, coherent pieces rather than simply cutting them off.