Skip to content

Conversation

@bjee19
Copy link
Contributor

@bjee19 bjee19 commented Dec 10, 2025

Proposed changes

Add support for multiple InferencePool backends on a Route.

Problem: A route should be able to have multiple InferencePools in its backendRefs.

Solution: Add support for multiple InferencePool backends. Added logic to remove duplicated inference maps.

Testing: Added unit tests and enabled correlating GatewayWeightedAcrossTwoInferencePools conformance test. Manually tested situations for multiple inferencepool backends with and without http matches.

Closes #4192

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.

Add support for multiple InferencePool backends on a Route. 

@github-actions github-actions bot added enhancement New feature or request tests Pull requests that update tests labels Dec 10, 2025
@bjee19
Copy link
Contributor Author

bjee19 commented Dec 10, 2025

Still doing some testing, just wanted to run pipeline, will promote to ready to review PR when cleaned up.

@bjee19 bjee19 force-pushed the enh/inference-extension-multiple-backendrefs branch from e611f14 to a8cbd36 Compare December 10, 2025 23:37
@bjee19 bjee19 changed the title Add support for multiple inferencepool backends Add support for multiple InferencePool backends Dec 10, 2025
@codecov
Copy link

codecov bot commented Dec 10, 2025

Codecov Report

❌ Patch coverage is 98.53480% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.20%. Comparing base (1481231) to head (ebae18d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/controller/nginx/config/servers.go 99.05% 1 Missing and 1 partial ⚠️
internal/controller/nginx/config/split_clients.go 87.50% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4439      +/-   ##
==========================================
+ Coverage   86.03%   86.20%   +0.16%     
==========================================
  Files         132      132              
  Lines       14382    14566     +184     
  Branches       35       35              
==========================================
+ Hits        12374    12557     +183     
- Misses       1793     1794       +1     
  Partials      215      215              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bjee19 bjee19 marked this pull request as ready for review December 11, 2025 18:25
@bjee19 bjee19 requested a review from a team as a code owner December 11, 2025 18:25
@bjee19 bjee19 requested a review from Copilot December 11, 2025 18:42
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for multiple InferencePool backends on a Route, enabling weighted traffic distribution across inference backends. Previously, routes were limited to a single InferencePool backend per rule.

Key Changes:

  • Removed restriction preventing multiple InferencePool backends in a single rule
  • Added validation to prevent mixing InferencePool and non-InferencePool backends
  • Implemented deduplication of inference maps to handle multiple backends efficiently

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/Makefile Enabled GatewayWeightedAcrossTwoInferencePools conformance test and added --ignore-not-found flags for cleanup commands
internal/controller/state/graph/httproute_test.go Added comprehensive test cases for multiple weighted InferencePool backends with and without HTTP matches
internal/controller/state/graph/httproute.go Replaced single-backend restriction with validation for mixed backend types and added checkForMixedBackendTypes function
internal/controller/nginx/config/split_clients_test.go Added test cases for inference backends with endpoint picker configs and split client value generation
internal/controller/nginx/config/split_clients.go Updated split client generation to support inference backend groups with specialized variable naming
internal/controller/nginx/config/servers_test.go Added extensive test coverage for multiple inference backend scenarios with various match conditions
internal/controller/nginx/config/servers.go Refactored location generation to support multiple inference backends with proper EPP and proxy pass locations
internal/controller/nginx/config/maps_test.go Added test cases for unique backend deduplication and failure mode verification
internal/controller/nginx/config/maps.go Implemented deduplication logic using a map to prevent duplicate inference backend entries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@sjberman sjberman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this. It really is a complex mess to build all of these locations, and I'm hopeful in the future we can improve it, potentially with improvements in NGINX where we don't need the NJS matching module, as well as potentially the inference Rust module to skip the inference nested locations.

Can you verify that if a ClientSettingsPolicy with maxSize is set, that it gets propagated into every location down the chain?

@bjee19 bjee19 requested a review from Copilot December 12, 2025 08:03
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@salonichf5 salonichf5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work on this Ben, thank you for the detailed comments. Very helpful. I think i have the gist of whats happening.

Just want to also confirm again with you

Location has two cases - Regular & Inference

  1. Regular backends:
  • If the route has only a path match, we generate external location(s) that proxy_pass directly to the backend.
  • If the route has HTTP match conditions (method/headers/query) or multiple matches, the external location runs httpmatches.match (NJS) and internally redirects to a per-match internal location that does the proxy_pass.
  1. Inference backends:
  • The final hop always proxy_passes to an inference backend variable (http://$inference_backend_*).
  • With a single inference backend, the external (or internal) EPP location calls epp.getEndpoint and redirects to the final internal proxy-pass location.
  • With multiple inference backends, split_clients chooses an internal EPP location (per backend); we use rewrite ... last to jump to that internal EPP location, which then calls epp.getEndpoint and redirects to the final internal proxy-pass location.

Just have small edit recommendation but I think it looks good overall.

@bjee19
Copy link
Contributor Author

bjee19 commented Dec 12, 2025

Can you verify that if a ClientSettingsPolicy with maxSize is set, that it gets propagated into every location down the chain?

Verified that when a ClientSettingsPolicy setting maxSize is set on a route, and that route has

  • non-http match single inferencepool backend
  • non-http match multi inferencepool backend
  • http-match single inferencepool backend
  • http-match multi inferencepool backend

Every generated internal location and external path locations have the policy included.

@bjee19
Copy link
Contributor Author

bjee19 commented Dec 12, 2025

Location has two cases - Regular & Inference

Regular backends:
If the route has only a path match, we generate external location(s) that proxy_pass directly to the backend.
If the route has HTTP match conditions (method/headers/query) or multiple matches, the external location runs >httpmatches.match (NJS) and internally redirects to a per-match internal location that does the proxy_pass.

Inference backends:
The final hop always proxy_passes to an inference backend variable (http://$inference_backend_*).
With a single inference backend, the external (or internal) EPP location calls epp.getEndpoint and redirects to the final >internal proxy-pass location.
With multiple inference backends, split_clients chooses an internal EPP location (per backend); we use rewrite ... last to >jump to that internal EPP location, which then calls epp.getEndpoint and redirects to the final internal proxy-pass location.

Yep these are all correct.

@bjee19 bjee19 force-pushed the enh/inference-extension-multiple-backendrefs branch from 84046a2 to ebae18d Compare December 12, 2025 18:57
@bjee19 bjee19 enabled auto-merge (squash) December 12, 2025 19:14
@bjee19 bjee19 merged commit b389cdd into main Dec 12, 2025
61 checks passed
@bjee19 bjee19 deleted the enh/inference-extension-multiple-backendrefs branch December 12, 2025 19:32
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in NGINX Gateway Fabric Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request release-notes tests Pull requests that update tests

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Support multiple backend refs when ref is an InferencePool

4 participants