Skip to content

fix: drop N+1 db query on template ACL available#25465

Merged
Emyrk merged 7 commits into
mainfrom
steven/plat-149-template-acl-n1
May 20, 2026
Merged

fix: drop N+1 db query on template ACL available#25465
Emyrk merged 7 commits into
mainfrom
steven/plat-149-template-acl-n1

Conversation

@Emyrk
Copy link
Copy Markdown
Member

@Emyrk Emyrk commented May 18, 2026

Fixes PLAT-149.

/acl/available ran a db query per group. A deployment with >5,000 groups made this route extremely slow.

This solves the n+1, adds searching to the group name, and limits the number of groups returned by default.

AI description of PR

Problem

GET /api/v2/templates/{template}/acl/available was extremely slow on deployments with many groups (~2 minutes reported by a customer with 5,401 groups; ~8 seconds on internal reproduction). Two issues compounded:

  1. N+1 lookups: the handler ran 1 + 2*N DB queries per request (one GetGroupMembersByGroupID and one GetGroupMembersCountByGroupID per group). With 5,401 groups that is ~10,803 queries on every autocomplete keystroke.
  2. q and limit were silently ignored for groups: the frontend autocomplete (UserOrGroupAutocomplete.tsx) sends ?q=<input>&limit=25 on every keystroke. The users half of the response honored both; the groups half ignored them and shipped every group in the org back on each call.

Change

Commit 1 — fix the N+1. Replace the per-group lookups with a single aggregate query and stop fetching member rows for each group. New SQL GetGroupMembersCountByGroupIDs(group_ids[], include_system) returns (group_id, count) pairs in one query. Group.Members is left empty on this endpoint; consumers should use Group.TotalMemberCount, which is already documented as the canonical value ("May be greater than len(Group.Members)").

Commit 2 — apply q and limit to groups server-side. Add Search and LimitOpt params to GetGroups (defaults are no-op, so other callers are unaffected). The handler now parses q via searchquery.Users (same semantics as the users half) and limit via ParsePagination, then passes both to GetGroups. GetGroups also gains a deterministic ORDER BY so the limit operates over a stable sort.

The Go SDK TemplateACLAvailable now takes a codersdk.UsersRequest for parity with the JS SDK and the rest of the paginated endpoints.

Net effect on the customer environment (5,401 groups, ?q=foo&limit=25):

  • DB queries: ~10,8032 (constant).
  • Groups in response: 5,401 → up to 25 (and filtered to substring matches).
  • Per-group payload: full members arrays → empty (uses total_member_count).

Out of scope (follow-ups)

  • Same N+1 pattern in groups / groupsByOrganization (/api/v2/groups, /api/v2/organizations/{org}/groups). Same SQL change to GetGroups could be wired in, but the handlers there have other API-shape considerations and aren't on the urgent path.
  • Reported /workspaces page slowness — different code path, separate root cause.

Test plan

New regression tests in enterprise/coderd/templates_test.go:

  • TestTemplateACL/AvailableReturnsGroupMemberCounts — creates groups with varying member counts (0, 1, 3) and asserts the available endpoint returns correct total_member_count values and an empty members array per group.
  • TestTemplateACL/AvailableHonorsGroupSearchAndLimit — creates four named groups, asserts ?q=engineering returns only the engineering-* groups, and asserts ?limit=2 caps the response to 2 groups.

New dbauthz test case TestGroup/GetGroupMembersCountByGroupIDs covering NotAuthorized, Canceled, AsRemoveActor, and Success.

Existing TestTemplateACL, TestUpdateTemplateACL, coderd/database/..., coderd/idpsync/..., and coderd/dynamicparameters/... suites still pass. make pre-commit is green locally.

Implementation plan
  1. Add SQL query GetGroupMembersCountByGroupIDs(group_ids uuid[], include_system bool) :many in coderd/database/queries/groupmembers.sql.
  2. Extend GetGroups with optional search (substring on name / display_name), ORDER BY LOWER(COALESCE(NULLIF(display_name, ''), name)), and LIMIT NULLIF(@limit_opt, 0).
  3. Regenerate queries.sql.go, querier.go, dbmock, and dbmetrics via make gen.
  4. Implement the dbauthz wrapper for GetGroupMembersCountByGroupIDs. AuthZ mirrors the existing GetGroupMembersCountByGroupID: read access on each requested group via GetGroupByID.
  5. Rewrite templateAvailablePermissions to:
    • parse q via searchquery.Users and limit via ParsePagination;
    • call GetGroups with the filter / limit;
    • bulk-fetch counts via GetGroupMembersCountByGroupIDs;
    • pass nil for the members slice to db2sdk.Group.
  6. Update codersdk.TemplateACLAvailable to accept codersdk.UsersRequest. Update existing call sites.
  7. Add the two regression tests described above.

Generated by the Coder Agent on behalf of @Emyrk.

@Emyrk Emyrk changed the title fix(enterprise/coderd): drop N+1 in template ACL available endpoint fix(enterprise/coderd): drop N+1 and apply q/limit to template ACL available May 18, 2026
@Emyrk Emyrk changed the title fix(enterprise/coderd): drop N+1 and apply q/limit to template ACL available fix(enterprise/coderd): drop N+1 db query on template ACL available May 19, 2026
@Emyrk Emyrk force-pushed the steven/plat-149-template-acl-n1 branch from ca446e3 to 38f8fd6 Compare May 19, 2026 16:52
@Emyrk Emyrk changed the title fix(enterprise/coderd): drop N+1 db query on template ACL available fix: drop N+1 db query on template ACL available May 19, 2026
@Emyrk Emyrk marked this pull request as ready for review May 19, 2026 17:28
Comment on lines +3472 to +3476
if err := q.authorizeContext(ctx, policy.ActionRead, rbac.ResourceGroup); err != nil {
// Ideally we would check read access on each group ID, but that would be N queries.
// So this function is really only usable by admins.
return nil, err
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good method around doing this broad rbac check.
This is to avoid and n+1 query to fetch the groups

Comment thread codersdk/templates.go
// template perms. The optional req controls the q/limit/offset query
// parameters applied server-side; pass codersdk.UsersRequest{} when no
// filtering is desired.
func (c *Client) TemplateACLAvailable(ctx context.Context, templateID uuid.UUID, req UsersRequest) (ACLAvailable, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended to accept UsersRequest which has a bunch of other fields that aren't used in TemplateACLAvailable() like Offset?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so weird, but the query does get parsed by the serverside endpoint for users.
I don't really want to change things up right now.

Part of the user query applies to groups, but the entire query applies to users in the same list 🤷

Comment thread enterprise/coderd/templates.go
return
}

// Apply the same q/limit semantics to groups as the users half of this response.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update the swagger docs for templateAvailablePermissions()?

Copy link
Copy Markdown
Member Author

@Emyrk Emyrk May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. This behavior feels more intuitive. Before the groups ignored the search query entirely, which made no sense

Comment thread coderd/database/dbauthz/dbauthz_test.go Outdated
{GroupID: g2.ID, MemberCount: 2},
}
dbm.EXPECT().GetGroupByID(gomock.Any(), g1.ID).Return(g1, nil).AnyTimes()
dbm.EXPECT().GetGroupByID(gomock.Any(), g2.ID).Return(g2, nil).AnyTimes()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what these GetGroupByID() expectations are adding. I think we can remove these two.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@Emyrk Emyrk enabled auto-merge (squash) May 20, 2026 20:30
Emyrk added 7 commits May 20, 2026 16:54
PLAT-149: The /api/v2/templates/{template}/acl/available endpoint made
1 + 2*N database queries (one per group for members + one per group for
the count), causing ~2 minute responses on deployments with several
thousand groups. The frontend only renders Group.TotalMemberCount on
this endpoint and never displays per-group members.

Replace the per-group lookups with a single aggregate query
(GetGroupMembersCountByGroupIDs) and stop fetching the full member rows
for each group. The endpoint now makes a constant 2 DB queries:

  1. GetGroups (unchanged)
  2. GetGroupMembersCountByGroupIDs (new bulk count)

The Group.Members field is left empty in this response; consumers
should use Group.TotalMemberCount, which is already documented as the
canonical value.
…vailable

The /api/v2/templates/{template}/acl/available endpoint already honored
q and limit for the users half of its response, but silently ignored
them for groups. The autocomplete on the template permissions page
sends q=<input>&limit=25 on every keystroke; without server-side
filtering, deployments with thousands of groups would still ship every
group on each keystroke even after the N+1 fix.

- Add 'search' and 'limit_opt' params to GetGroups. Search applies a
  case-insensitive substring filter against groups.name and
  groups.display_name. A limit of 0 keeps the existing 'no limit'
  behavior so other callers of GetGroups (idpsync, telemetry,
  provisionerd, dynamicparameters, workspaces) are unaffected.
- Add ORDER BY so pagination is stable across calls. Sort by the value
  the UI displays (display_name, falling back to name).
- Parse q via searchquery.Users so search semantics match the users
  half exactly. Parse limit via ParsePagination.
- Bump the codersdk TemplateACLAvailable signature to accept a
  UsersRequest (parity with the JS SDK and the other paginated
  endpoints). Update both existing call sites.

Companion test TestTemplateACL/AvailableHonorsGroupSearchAndLimit
covers both the q filter and the limit cap.
@Emyrk Emyrk force-pushed the steven/plat-149-template-acl-n1 branch from 5ca5a15 to b06490c Compare May 20, 2026 21:54
@Emyrk Emyrk merged commit 9b6eada into main May 20, 2026
64 of 68 checks passed
@Emyrk Emyrk deleted the steven/plat-149-template-acl-n1 branch May 20, 2026 22:40
@github-actions github-actions Bot locked and limited conversation to collaborators May 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants