Building a bespoke component is essential when standard LangChain building blocks do not perfectly match specific requirements in production systems. These requirements often include integrating with proprietary internal services, enforcing unique validation logic, or performing complex data transformations not covered by default loaders or parsers. Custom Runnable components are the standard way to extend LangChain's capabilities for these situations, allowing for tailored solutions.This practical exercise guides you through creating a custom component that performs a specific task: validating input data against a pattern and enriching it with metadata before passing it along the chain. This reinforces the concepts of component structure, asynchronous execution, and integration within LCEL pipelines.Defining the Custom Logic: Input Validation and EnrichmentImagine a scenario where user input needs to adhere to a specific format (e.g., an order ID) before being processed by an LLM for status lookup. Additionally, we want to automatically add a timestamp to the request data for logging or auditing purposes. A standard prompt template or parser isn't designed for this combined validation and enrichment logic. This is an ideal case for a custom component.Our custom component, let's call it InputValidatorEnricher, will:Accept a dictionary as input, expecting a specific key (e.g., user_query).Validate the value associated with user_query against a predefined regular expression.If validation fails, raise an informative error.If validation succeeds, add a new key-value pair to the dictionary, timestamp, containing the current UTC timestamp.Return the enriched dictionary.Implementing the Custom Runnable ComponentLangChain components fundamentally adhere to the Runnable interface. For creating custom, potentially stateful, or serializable components, inheriting from RunnableSerializable (found in langchain_core.runnables) is often a good choice. It provides a solid foundation and integrates well with the broader LangChain ecosystem, including LangSmith tracing. We'll also use Pydantic models to define clear input and output schemas for our component, enhancing type safety and clarity.import re import datetime from typing import Dict, Any from pydantic import BaseModel, Field, validator from langchain_core.runnables import RunnableSerializable from langchain_core.runnables.config import RunnableConfig # Define Input and Output Schemas using Pydantic class InputSchema(BaseModel): user_query: str = Field(..., description="The user's input query, expected to match a pattern.") class OutputSchema(BaseModel): user_query: str timestamp: datetime.datetime = Field(description="UTC timestamp when the input was processed.") is_valid: bool = Field(default=True, description="Flag indicating successful validation.") # Define the Custom Component class InputValidatorEnricher(RunnableSerializable[InputSchema, OutputSchema]): """ A custom Runnable that validates the 'user_query' against a regex pattern and enriches the input with a timestamp. """ pattern: str # Store the regex pattern class Config: # Allows 'pattern' to be set during initialization arbitrary_types_allowed = True def __init__(self, pattern: str, **kwargs): super().__init__(**kwargs) self.pattern = pattern # Pre-compile the regex for efficiency self._compiled_pattern = re.compile(pattern) @validator('pattern') def validate_regex_pattern(cls, v): try: re.compile(v) except re.error: raise ValueError("Invalid regex pattern provided.") return v def _validate_and_enrich(self, input_data: InputSchema) -> OutputSchema: """Synchronous validation and enrichment logic.""" if not self._compiled_pattern.match(input_data.user_query): # In a real application, you might raise a custom exception # or return a specific error structure. Here we raise ValueError. raise ValueError(f"Input query '{input_data.user_query}' does not match pattern '{self.pattern}'") now_utc = datetime.datetime.now(datetime.timezone.utc) enriched_data = OutputSchema( user_query=input_data.user_query, timestamp=now_utc, is_valid=True ) return enriched_data def invoke(self, input: Dict[str, Any], config: RunnableConfig | None = None) -> OutputSchema: """Synchronous execution method.""" # Validate input against the schema validated_input = InputSchema(**input) # Perform the core logic result = self._validate_and_enrich(validated_input) return result async def ainvoke(self, input: Dict[str, Any], config: RunnableConfig | None = None) -> OutputSchema: """Asynchronous execution method.""" # For this specific component, the logic is inherently synchronous. " # In practical scenarios involving I/O (like API calls)," # you would use async libraries (e.g., httpx, aiohttp). # Here, we simply wrap the synchronous call. # In more complex cases, you might use asyncio.to_thread # or implement native async logic. validated_input = InputSchema(**input) result = self._validate_and_enrich(validated_input) # Simulate async operation if needed, otherwise just return sync result # await asyncio.sleep(0) # Example placeholder for actual async work return result # Define input and output types for better introspection and validation @property def InputType(self): return InputSchema @property def OutputType(self): return OutputSchema In this implementation:We define InputSchema and OutputSchema using Pydantic for clear data contracts.InputValidatorEnricher inherits from RunnableSerializable.The __init__ method accepts the regex pattern and pre-compiles it. A Pydantic validator ensures the provided pattern is valid regex.The core logic is encapsulated in _validate_and_enrich.invoke handles synchronous calls, first validating the input dictionary against InputSchema, then calling the core logic.ainvoke provides the asynchronous interface. Since our current logic is CPU-bound, we reuse the synchronous method. For I/O-bound tasks, you would implement genuinely asynchronous logic here.InputType and OutputType properties expose the Pydantic models, aiding LangChain's internal mechanisms and potentially LangSmith tracing.Integrating the Component into an LCEL ChainNow, let's integrate our InputValidatorEnricher into a simple LCEL chain. We'll pipe the output of our custom component into a prompt template and then to a language model.# Assume necessary imports: ChatOpenAI, PromptTemplate, StrOutputParser from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough # To pass input through # --- Define the Chain --- # 1. Instantiate our custom component with a specific pattern # Example: Match order IDs like 'ORD-12345' validator_enricher = InputValidatorEnricher(pattern=r"^ORD-\d{5}$") # 2. Define the prompt template # It expects the output dictionary from our component prompt = ChatPromptTemplate.from_template( "User query '{user_query}' received at {timestamp}. Please look up the status." ) # 3. Instantiate the LLM # Ensure you have OPENAI_API_KEY set in your environment llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0) # 4. Define the output parser parser = StrOutputParser() # 5. Construct the chain using LCEL pipe syntax # We use RunnablePassthrough to make the original input available if needed, # but here we mainly pipe the output of one step to the next. # The input to the chain should be a dictionary matching InputSchema: {"user_query": "..."} # Option A: Directly pipe the custom component output to the prompt chain_a = validator_enricher | prompt | llm | parser # Option B: If you needed the original input *and* the enriched output later # (less common for this specific prompt, but shows the pattern) # chain_b = RunnablePassthrough.assign(enriched=validator_enricher) | prompt | llm | parser # Input: {"user_query": "ORD-12345"} # Output of RunnablePassthrough.assign: {'user_query': 'ORD-12345', 'enriched': {'user_query': 'ORD-12345', 'timestamp': ..., 'is_valid': True}} # The prompt needs adjustment if using chain_b to access {'enriched']['user_query'], etc. # We will use chain_a for simplicity here. # --- Testing the Chain --- # Test with valid input valid_input = {"user_query": "ORD-98765"} print(f"Testing with valid input: {valid_input}") try: # Use invoke for synchronous execution result_valid = chain_a.invoke(valid_input) print("Result (Valid Input):") print(result_valid) except Exception as e: print(f"Error (Valid Input): {e}") print("\n" + "="*20 + "\n") # Test with invalid input invalid_input = {"user_query": "lookup order 123"} print(f"Testing with invalid input: {invalid_input}") try: # Use invoke for synchronous execution result_invalid = chain_a.invoke(invalid_input) print("Result (Invalid Input):") print(result_invalid) except ValueError as e: # Catch the specific validation error we expect print(f"Caught Expected Error (Invalid Input): {e}") except Exception as e: print(f"Caught Unexpected Error (Invalid Input): {e}") # Example of using ainvoke (requires an async environment) # import asyncio # async def run_async(): # result_async = await chain_a.ainvoke(valid_input) # print("\nAsync Result (Valid Input):") # print(result_async) # # # To run the async function: # # asyncio.run(run_async())In this integration:We create an instance of InputValidatorEnricher with our desired regex pattern.The ChatPromptTemplate is designed to accept the OutputSchema dictionary from our custom component.We use the standard LCEL | operator to pipe the output of validator_enricher directly into the prompt.The tests demonstrate how valid input flows through the entire chain, while invalid input correctly raises the ValueError from our custom component before reaching the LLM, preventing unnecessary API calls.Verification and DebuggingAs shown in the test cases, direct execution helps verify the component's behavior for both valid and invalid inputs. In more complex scenarios, remember the debugging techniques discussed earlier:Verbose Mode: Set langchain.debug = True for detailed execution logs.LangSmith: If configured, LangSmith automatically traces the execution, allowing you to inspect the inputs and outputs of each step, including your custom component. This is invaluable for understanding failures or unexpected behavior in production. You can clearly see the data transformation performed by InputValidatorEnricher.This practical exercise demonstrates the fundamental process of extending LangChain with custom logic. By mastering the Runnable interface and LCEL integration, you gain the flexibility to adapt LangChain to nearly any task, building sophisticated, production-ready applications tailored to your specific needs. This ability to blend standard components with custom code is a core strength of the framework.