-
Notifications
You must be signed in to change notification settings - Fork 152
Add support for multiple InferencePool backends #4439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Still doing some testing, just wanted to run pipeline, will promote to ready to review PR when cleaned up. |
e611f14 to
a8cbd36
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4439 +/- ##
==========================================
+ Coverage 86.03% 86.20% +0.16%
==========================================
Files 132 132
Lines 14382 14566 +184
Branches 35 35
==========================================
+ Hits 12374 12557 +183
- Misses 1793 1794 +1
Partials 215 215 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for multiple InferencePool backends on a Route, enabling weighted traffic distribution across inference backends. Previously, routes were limited to a single InferencePool backend per rule.
Key Changes:
- Removed restriction preventing multiple InferencePool backends in a single rule
- Added validation to prevent mixing InferencePool and non-InferencePool backends
- Implemented deduplication of inference maps to handle multiple backends efficiently
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
tests/Makefile |
Enabled GatewayWeightedAcrossTwoInferencePools conformance test and added --ignore-not-found flags for cleanup commands |
internal/controller/state/graph/httproute_test.go |
Added comprehensive test cases for multiple weighted InferencePool backends with and without HTTP matches |
internal/controller/state/graph/httproute.go |
Replaced single-backend restriction with validation for mixed backend types and added checkForMixedBackendTypes function |
internal/controller/nginx/config/split_clients_test.go |
Added test cases for inference backends with endpoint picker configs and split client value generation |
internal/controller/nginx/config/split_clients.go |
Updated split client generation to support inference backend groups with specialized variable naming |
internal/controller/nginx/config/servers_test.go |
Added extensive test coverage for multiple inference backend scenarios with various match conditions |
internal/controller/nginx/config/servers.go |
Refactored location generation to support multiple inference backends with proper EPP and proxy pass locations |
internal/controller/nginx/config/maps_test.go |
Added test cases for unique backend deduplication and failure mode verification |
internal/controller/nginx/config/maps.go |
Implemented deduplication logic using a map to prevent duplicate inference backend entries |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
sjberman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this. It really is a complex mess to build all of these locations, and I'm hopeful in the future we can improve it, potentially with improvements in NGINX where we don't need the NJS matching module, as well as potentially the inference Rust module to skip the inference nested locations.
Can you verify that if a ClientSettingsPolicy with maxSize is set, that it gets propagated into every location down the chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
salonichf5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work on this Ben, thank you for the detailed comments. Very helpful. I think i have the gist of whats happening.
Just want to also confirm again with you
Location has two cases - Regular & Inference
- Regular backends:
- If the route has only a path match, we generate external location(s) that proxy_pass directly to the backend.
- If the route has HTTP match conditions (method/headers/query) or multiple matches, the external location runs httpmatches.match (NJS) and internally redirects to a per-match internal location that does the proxy_pass.
- Inference backends:
- The final hop always proxy_passes to an inference backend variable (http://$inference_backend_*).
- With a single inference backend, the external (or internal) EPP location calls epp.getEndpoint and redirects to the final internal proxy-pass location.
- With multiple inference backends, split_clients chooses an internal EPP location (per backend); we use
rewrite ... lastto jump to that internal EPP location, which then calls epp.getEndpoint and redirects to the final internal proxy-pass location.
Just have small edit recommendation but I think it looks good overall.
Verified that when a ClientSettingsPolicy setting maxSize is set on a route, and that route has
Every generated internal location and external path locations have the policy included. |
Yep these are all correct. |
84046a2 to
ebae18d
Compare
Proposed changes
Add support for multiple InferencePool backends on a Route.
Problem: A route should be able to have multiple InferencePools in its backendRefs.
Solution: Add support for multiple InferencePool backends. Added logic to remove duplicated inference maps.
Testing: Added unit tests and enabled correlating
GatewayWeightedAcrossTwoInferencePoolsconformance test. Manually tested situations for multiple inferencepool backends with and without http matches.Closes #4192
Checklist
Before creating a PR, run through this checklist and mark each as complete.
Release notes
If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.