Init Lookup Driver Just once - part 1 #139406
Draft
+2,619
−419
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is a refactoring step towards Init Lookup Driver Just once.
Currently, AbstractLookupService::doLookup() creates operators directly from the request and input page. Since operators are tightly coupled with the driver and input pages, they cannot be reused across multiple pages. We plan to add local logical and physical planning. However, we cannot do that per page as it would add too much overhead. We need to perform planning once during session initialization rather than for every page. This PR takes a step in that direction by generating a physical plan first that can be shared across multiple pages. Main changes include:
1. Refactor
AbstractLookupService::doLookup(). Instead of creating operators directly, we now create PhysicalPlan. We then covertPhysicalPlan ->Operator Factories-> Operators.This separation allows the PhysicalPlan to be generated once and cached in a future PR, since it doesn't depend on the input page data.
2. QueryLists are no longer dependent on a particular page and stateless in terms of page contents. They use channelOffset instead of blocks. QueryLists are to be created during planning (before we have input pages), so they can no longer store blocks directly. Instead, each QueryList stores a channelOffset (the index of the block within a page). Since the page structure is consistent across all pages in a session, the channelOffset remains the same. At runtime, when getQuery() is called, the QueryList extracts the appropriate block from the current page using inputPage.getBlock(channelOffset)
3. New Physical Plan Nodes - LookupDropMergeExec and ParameterizedQueryExec
4. New LookupExecutionMapper - converts a Physical Plan to Operators for the lookup node, handles dictionary encoding optimization for Enrich (and possibly lookup join in the future).