Add support for multiple InferencePool backends #4439

bjee19 · 2025-12-10T23:32:33Z

Proposed changes

Add support for multiple InferencePool backends on a Route.

Problem: A route should be able to have multiple InferencePools in its backendRefs.

Solution: Add support for multiple InferencePool backends. Added logic to remove duplicated inference maps.

Testing: Added unit tests and enabled correlating GatewayWeightedAcrossTwoInferencePools conformance test. Manually tested situations for multiple inferencepool backends with and without http matches.

Closes #4192

Checklist

Before creating a PR, run through this checklist and mark each as complete.

I have read the CONTRIBUTING doc
I have added tests that prove my fix is effective or that my feature works
I have checked that all unit tests pass after adding my changes
I have updated necessary documentation
I have rebased my branch onto main
I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.

Add support for multiple InferencePool backends on a Route.

bjee19 · 2025-12-10T23:33:04Z

Still doing some testing, just wanted to run pipeline, will promote to ready to review PR when cleaned up.

codecov · 2025-12-10T23:40:09Z

Codecov Report

❌ Patch coverage is 98.53480% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.20%. Comparing base (1481231) to head (ebae18d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
internal/controller/nginx/config/servers.go	99.05%	1 Missing and 1 partial ⚠️
internal/controller/nginx/config/split_clients.go	87.50%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4439      +/-   ##
==========================================
+ Coverage   86.03%   86.20%   +0.16%     
==========================================
  Files         132      132              
  Lines       14382    14566     +184     
  Branches       35       35              
==========================================
+ Hits        12374    12557     +183     
- Misses       1793     1794       +1     
  Partials      215      215

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR adds support for multiple InferencePool backends on a Route, enabling weighted traffic distribution across inference backends. Previously, routes were limited to a single InferencePool backend per rule.

Key Changes:

Removed restriction preventing multiple InferencePool backends in a single rule
Added validation to prevent mixing InferencePool and non-InferencePool backends
Implemented deduplication of inference maps to handle multiple backends efficiently

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`tests/Makefile`	Enabled `GatewayWeightedAcrossTwoInferencePools` conformance test and added `--ignore-not-found` flags for cleanup commands
`internal/controller/state/graph/httproute_test.go`	Added comprehensive test cases for multiple weighted InferencePool backends with and without HTTP matches
`internal/controller/state/graph/httproute.go`	Replaced single-backend restriction with validation for mixed backend types and added `checkForMixedBackendTypes` function
`internal/controller/nginx/config/split_clients_test.go`	Added test cases for inference backends with endpoint picker configs and split client value generation
`internal/controller/nginx/config/split_clients.go`	Updated split client generation to support inference backend groups with specialized variable naming
`internal/controller/nginx/config/servers_test.go`	Added extensive test coverage for multiple inference backend scenarios with various match conditions
`internal/controller/nginx/config/servers.go`	Refactored location generation to support multiple inference backends with proper EPP and proxy pass locations
`internal/controller/nginx/config/maps_test.go`	Added test cases for unique backend deduplication and failure mode verification
`internal/controller/nginx/config/maps.go`	Implemented deduplication logic using a map to prevent duplicate inference backend entries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

internal/controller/nginx/config/servers.go

internal/controller/nginx/config/servers_test.go

internal/controller/nginx/config/servers.go

internal/controller/nginx/config/maps.go

sjberman

Great work on this. It really is a complex mess to build all of these locations, and I'm hopeful in the future we can improve it, potentially with improvements in NGINX where we don't need the NJS matching module, as well as potentially the inference Rust module to skip the inference nested locations.

Can you verify that if a ClientSettingsPolicy with maxSize is set, that it gets propagated into every location down the chain?

internal/controller/nginx/config/servers.go

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

internal/controller/nginx/config/servers.go

internal/controller/state/dataplane/types.go

internal/controller/nginx/config/split_clients.go

salonichf5

great work on this Ben, thank you for the detailed comments. Very helpful. I think i have the gist of whats happening.

Just want to also confirm again with you

Location has two cases - Regular & Inference

Regular backends:

If the route has only a path match, we generate external location(s) that proxy_pass directly to the backend.
If the route has HTTP match conditions (method/headers/query) or multiple matches, the external location runs httpmatches.match (NJS) and internally redirects to a per-match internal location that does the proxy_pass.

Inference backends:

The final hop always proxy_passes to an inference backend variable (http://$inference_backend_*).
With a single inference backend, the external (or internal) EPP location calls epp.getEndpoint and redirects to the final internal proxy-pass location.
With multiple inference backends, split_clients chooses an internal EPP location (per backend); we use rewrite ... last to jump to that internal EPP location, which then calls epp.getEndpoint and redirects to the final internal proxy-pass location.

Just have small edit recommendation but I think it looks good overall.

bjee19 · 2025-12-12T18:27:52Z

Can you verify that if a ClientSettingsPolicy with maxSize is set, that it gets propagated into every location down the chain?

Verified that when a ClientSettingsPolicy setting maxSize is set on a route, and that route has

non-http match single inferencepool backend
non-http match multi inferencepool backend
http-match single inferencepool backend
http-match multi inferencepool backend

Every generated internal location and external path locations have the policy included.

bjee19 · 2025-12-12T18:40:55Z

Location has two cases - Regular & Inference

Regular backends:
If the route has only a path match, we generate external location(s) that proxy_pass directly to the backend.
If the route has HTTP match conditions (method/headers/query) or multiple matches, the external location runs >httpmatches.match (NJS) and internally redirects to a per-match internal location that does the proxy_pass.

Inference backends:
The final hop always proxy_passes to an inference backend variable (http://$inference_backend_*).
With a single inference backend, the external (or internal) EPP location calls epp.getEndpoint and redirects to the final >internal proxy-pass location.
With multiple inference backends, split_clients chooses an internal EPP location (per backend); we use rewrite ... last to >jump to that internal EPP location, which then calls epp.getEndpoint and redirects to the final internal proxy-pass location.

Yep these are all correct.

github-project-automation bot added this to NGINX Gateway Fabric Dec 10, 2025

nginx-bot bot added the release-notes label Dec 10, 2025

github-project-automation bot moved this to 🆕 New in NGINX Gateway Fabric Dec 10, 2025

github-actions bot added enhancement New feature or request tests Pull requests that update tests labels Dec 10, 2025

bjee19 force-pushed the enh/inference-extension-multiple-backendrefs branch from e611f14 to a8cbd36 Compare December 10, 2025 23:37

bjee19 changed the title ~~Add support for multiple inferencepool backends~~ Add support for multiple InferencePool backends Dec 10, 2025

bjee19 marked this pull request as ready for review December 11, 2025 18:25

bjee19 requested a review from a team as a code owner December 11, 2025 18:25

bjee19 requested a review from Copilot December 11, 2025 18:42

Copilot AI reviewed Dec 11, 2025

View reviewed changes

sjberman reviewed Dec 11, 2025

View reviewed changes

bjee19 requested a review from Copilot December 12, 2025 08:03

Copilot AI reviewed Dec 12, 2025

View reviewed changes

sjberman reviewed Dec 12, 2025

View reviewed changes

internal/controller/nginx/config/servers.go Outdated Show resolved Hide resolved

internal/controller/state/dataplane/types.go Outdated Show resolved Hide resolved

salonichf5 reviewed Dec 12, 2025

View reviewed changes

internal/controller/nginx/config/split_clients.go Outdated Show resolved Hide resolved

salonichf5 reviewed Dec 12, 2025

View reviewed changes

bjee19 requested review from salonichf5 and sjberman December 12, 2025 18:52

sjberman approved these changes Dec 12, 2025

View reviewed changes

bjee19 added 6 commits December 12, 2025 10:56

Add support for multiple inferencepool backends

5768a81

Add another split clients test case

19135ec

Update comment with correct naming

b9699b4

Remove some comments and update array capacity calculation

36eb26b

Update max location calculation

cfb4646

Add ordering to inferenceMaps

6c7c4fe

bjee19 added 6 commits December 12, 2025 10:56

Add pathRuleIdx to epp location

f49dc00

Refactor server test

2114de8

Refactor servers tempRule for deep copy

45283d9

Update locations comment

3e49d11

Fix includes spacing in template

35fc68a

Address feedback

ebae18d

bjee19 force-pushed the enh/inference-extension-multiple-backendrefs branch from 84046a2 to ebae18d Compare December 12, 2025 18:57

salonichf5 approved these changes Dec 12, 2025

View reviewed changes

bjee19 enabled auto-merge (squash) December 12, 2025 19:14

bjee19 merged commit b389cdd into main Dec 12, 2025
61 checks passed

bjee19 deleted the enh/inference-extension-multiple-backendrefs branch December 12, 2025 19:32

github-project-automation bot moved this from 🆕 New to ✅ Done in NGINX Gateway Fabric Dec 12, 2025

Add support for multiple InferencePool backends #4439

Add support for multiple InferencePool backends #4439

Conversation

bjee19 commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Release notes

Uh oh!

bjee19 commented Dec 10, 2025

Uh oh!

codecov bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sjberman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

salonichf5 left a comment

Choose a reason for hiding this comment

Uh oh!

bjee19 commented Dec 12, 2025

Uh oh!

bjee19 commented Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bjee19 commented Dec 10, 2025 •

edited

Loading

codecov bot commented Dec 10, 2025 •

edited

Loading