Enterprise RAG search
One gateway for retrieval, RBAC, and model routing at bank scale.
How a regulated team unified vector stores, audit trails, and LLM calls behind a single dex28 surface.
A global risk team needed semantic search over contracts and policies without duplicating integrations for each model provider. dex28 became the control plane: embeddings, retrieval policies, and inference share the same scopes and quotas.
Outcome: faster iteration on retrieval quality with governance that legal could review — one API contract from pilot to production.
Architecture centered on a single ingress (gateway) with RBAC-scoped keys per workspace. Vector shards sit behind named routes; the LLM path is swappable without changing client SDKs.
Design principle: separate “who can configure retrieval” from “who can call completion” so audit and cost controls stay orthogonal.
Scoped API keys with environment parity (dev / staging / prod), streaming responses with trace IDs, and quota alerts wired to FinOps dashboards.
Custom routes map to internal document classifiers before RAG so irrelevant chunks never enter the context window.
Operators use the dashboard for spend and error budgets; builders live in the playground for prompt and retrieval tuning. Both views reflect the same route IDs so promote/rollback is one action.
The UI keeps dense telemetry readable: monospace cues for IDs, headline type for status, and no decorative chrome — consistent with a control-plane aesthetic.