fix: drop N+1 db query on template ACL available#25465
Conversation
ca446e3 to
38f8fd6
Compare
| if err := q.authorizeContext(ctx, policy.ActionRead, rbac.ResourceGroup); err != nil { | ||
| // Ideally we would check read access on each group ID, but that would be N queries. | ||
| // So this function is really only usable by admins. | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
I don't have a good method around doing this broad rbac check.
This is to avoid and n+1 query to fetch the groups
| // template perms. The optional req controls the q/limit/offset query | ||
| // parameters applied server-side; pass codersdk.UsersRequest{} when no | ||
| // filtering is desired. | ||
| func (c *Client) TemplateACLAvailable(ctx context.Context, templateID uuid.UUID, req UsersRequest) (ACLAvailable, error) { |
There was a problem hiding this comment.
Is it intended to accept UsersRequest which has a bunch of other fields that aren't used in TemplateACLAvailable() like Offset?
There was a problem hiding this comment.
It's so weird, but the query does get parsed by the serverside endpoint for users.
I don't really want to change things up right now.
Part of the user query applies to groups, but the entire query applies to users in the same list 🤷
| return | ||
| } | ||
|
|
||
| // Apply the same q/limit semantics to groups as the users half of this response. |
There was a problem hiding this comment.
Do we need to update the swagger docs for templateAvailablePermissions()?
There was a problem hiding this comment.
I don't think so. This behavior feels more intuitive. Before the groups ignored the search query entirely, which made no sense
| {GroupID: g2.ID, MemberCount: 2}, | ||
| } | ||
| dbm.EXPECT().GetGroupByID(gomock.Any(), g1.ID).Return(g1, nil).AnyTimes() | ||
| dbm.EXPECT().GetGroupByID(gomock.Any(), g2.ID).Return(g2, nil).AnyTimes() |
There was a problem hiding this comment.
I don't see what these GetGroupByID() expectations are adding. I think we can remove these two.
PLAT-149: The /api/v2/templates/{template}/acl/available endpoint made
1 + 2*N database queries (one per group for members + one per group for
the count), causing ~2 minute responses on deployments with several
thousand groups. The frontend only renders Group.TotalMemberCount on
this endpoint and never displays per-group members.
Replace the per-group lookups with a single aggregate query
(GetGroupMembersCountByGroupIDs) and stop fetching the full member rows
for each group. The endpoint now makes a constant 2 DB queries:
1. GetGroups (unchanged)
2. GetGroupMembersCountByGroupIDs (new bulk count)
The Group.Members field is left empty in this response; consumers
should use Group.TotalMemberCount, which is already documented as the
canonical value.
…vailable
The /api/v2/templates/{template}/acl/available endpoint already honored
q and limit for the users half of its response, but silently ignored
them for groups. The autocomplete on the template permissions page
sends q=<input>&limit=25 on every keystroke; without server-side
filtering, deployments with thousands of groups would still ship every
group on each keystroke even after the N+1 fix.
- Add 'search' and 'limit_opt' params to GetGroups. Search applies a
case-insensitive substring filter against groups.name and
groups.display_name. A limit of 0 keeps the existing 'no limit'
behavior so other callers of GetGroups (idpsync, telemetry,
provisionerd, dynamicparameters, workspaces) are unaffected.
- Add ORDER BY so pagination is stable across calls. Sort by the value
the UI displays (display_name, falling back to name).
- Parse q via searchquery.Users so search semantics match the users
half exactly. Parse limit via ParsePagination.
- Bump the codersdk TemplateACLAvailable signature to accept a
UsersRequest (parity with the JS SDK and the other paginated
endpoints). Update both existing call sites.
Companion test TestTemplateACL/AvailableHonorsGroupSearchAndLimit
covers both the q filter and the limit cap.
5ca5a15 to
b06490c
Compare
Fixes PLAT-149.
/acl/availableran a db query per group. A deployment with >5,000 groups made this route extremely slow.This solves the n+1, adds searching to the group name, and limits the number of groups returned by default.
AI description of PR
Problem
GET /api/v2/templates/{template}/acl/availablewas extremely slow on deployments with many groups (~2 minutes reported by a customer with 5,401 groups; ~8 seconds on internal reproduction). Two issues compounded:1 + 2*NDB queries per request (oneGetGroupMembersByGroupIDand oneGetGroupMembersCountByGroupIDper group). With 5,401 groups that is ~10,803 queries on every autocomplete keystroke.qandlimitwere silently ignored for groups: the frontend autocomplete (UserOrGroupAutocomplete.tsx) sends?q=<input>&limit=25on every keystroke. The users half of the response honored both; the groups half ignored them and shipped every group in the org back on each call.Change
Commit 1 — fix the N+1. Replace the per-group lookups with a single aggregate query and stop fetching member rows for each group. New SQL
GetGroupMembersCountByGroupIDs(group_ids[], include_system)returns(group_id, count)pairs in one query.Group.Membersis left empty on this endpoint; consumers should useGroup.TotalMemberCount, which is already documented as the canonical value ("May be greater thanlen(Group.Members)").Commit 2 — apply
qandlimitto groups server-side. AddSearchandLimitOptparams toGetGroups(defaults are no-op, so other callers are unaffected). The handler now parsesqviasearchquery.Users(same semantics as the users half) andlimitviaParsePagination, then passes both toGetGroups.GetGroupsalso gains a deterministicORDER BYso the limit operates over a stable sort.The Go SDK
TemplateACLAvailablenow takes acodersdk.UsersRequestfor parity with the JS SDK and the rest of the paginated endpoints.Net effect on the customer environment (5,401 groups,
?q=foo&limit=25):~10,803→2(constant).5,401→ up to25(and filtered to substring matches).membersarrays → empty (usestotal_member_count).Out of scope (follow-ups)
groups/groupsByOrganization(/api/v2/groups,/api/v2/organizations/{org}/groups). Same SQL change toGetGroupscould be wired in, but the handlers there have other API-shape considerations and aren't on the urgent path./workspacespage slowness — different code path, separate root cause.Test plan
New regression tests in
enterprise/coderd/templates_test.go:TestTemplateACL/AvailableReturnsGroupMemberCounts— creates groups with varying member counts (0,1,3) and asserts the available endpoint returns correcttotal_member_countvalues and an emptymembersarray per group.TestTemplateACL/AvailableHonorsGroupSearchAndLimit— creates four named groups, asserts?q=engineeringreturns only theengineering-*groups, and asserts?limit=2caps the response to 2 groups.New
dbauthztest caseTestGroup/GetGroupMembersCountByGroupIDscoveringNotAuthorized,Canceled,AsRemoveActor, andSuccess.Existing
TestTemplateACL,TestUpdateTemplateACL,coderd/database/...,coderd/idpsync/..., andcoderd/dynamicparameters/...suites still pass.make pre-commitis green locally.Implementation plan
GetGroupMembersCountByGroupIDs(group_ids uuid[], include_system bool) :manyincoderd/database/queries/groupmembers.sql.GetGroupswith optionalsearch(substring onname/display_name),ORDER BY LOWER(COALESCE(NULLIF(display_name, ''), name)), andLIMIT NULLIF(@limit_opt, 0).queries.sql.go,querier.go,dbmock, anddbmetricsviamake gen.dbauthzwrapper forGetGroupMembersCountByGroupIDs. AuthZ mirrors the existingGetGroupMembersCountByGroupID: read access on each requested group viaGetGroupByID.templateAvailablePermissionsto:qviasearchquery.UsersandlimitviaParsePagination;GetGroupswith the filter / limit;GetGroupMembersCountByGroupIDs;nilfor the members slice todb2sdk.Group.codersdk.TemplateACLAvailableto acceptcodersdk.UsersRequest. Update existing call sites.Generated by the Coder Agent on behalf of @Emyrk.