AI Infrastructure Decision Dashboard
Build AI where your organization already lives
Identity, compliance, and data boundaries are already solved — don't duplicate them.
Organizations adopting AI capabilities face an infrastructure question that precedes any discussion of models, prompts, or use cases: where should these services live? The answer, when examined through the lens of identity management, compliance boundaries, and operational continuity, is almost always determined by decisions the organization has already made.
Most mid-market companies already govern user identity, file access, and collaboration through an established platform. Introducing AI services on separate infrastructure creates parallel governance: new credentials to manage, new access policies to maintain, new audit streams to monitor, and new vendor agreements for every compliance regime in play. This complexity is avoidable.
We build custom AI services inside the cloud boundary our clients already operate. For organizations running Microsoft 365, that means Azure. For organizations on Google Workspace, that means Google Cloud. For software companies building AI-powered products, the decision shifts from "where does your team live" to "where does your product need to scale," which often points to AWS and its multi-model AI ecosystem. In each case, six architecture principles remain constant: identity inheritance, boundary containment, permission-aware retrieval, human-in-the-loop through existing collaboration tools, native observability, and a single compliance agreement chain.
Where AI Services Live
AI adoption conversations tend to start with capabilities. What can the AI model do. What use cases are most valuable. How quickly can we deploy something. These are reasonable questions, and they deserve attention. But underneath all of them sits a more foundational decision: where will these services run?
The infrastructure question governs more than anyone expects. It determines how user identity is managed for the AI system: whether staff authenticate with credentials they already have or whether new accounts must be created and maintained. It determines how data is governed: whether the organization's existing security controls extend to AI processing or whether a new set of policies must be written and enforced. It determines how compliance is maintained: whether audit trails flow through systems the compliance team already monitors or whether a parallel monitoring apparatus must be built.
System A
System B
Stream 1
Stream 2
Double the audit surface.
Identity
Audit
One audit stream.
When AI services run on infrastructure separate from the organization's existing environment, every one of these concerns becomes a cross-platform exercise. Access audits must span two systems. Security reviews must evaluate two sets of controls. Vendor agreements multiply: the cloud provider agreement that covers email and file storage does not automatically cover a separate AI service running elsewhere. Each of these costs is individually manageable. Taken together, they compound quietly over the life of the deployment.
The pattern we have observed across engagements is that organizations who choose AI infrastructure aligned with their existing environment spend less time on governance and more time on capability. The security review is simpler because the AI services are already within the established perimeter. The compliance review is simpler because the audit trail already exists. The operational burden is lighter because the IT team already knows the tools.
Where does your organization already govern identity, access, and collaboration? The answer to that question narrows the infrastructure decision substantially, often before any technical evaluation begins.
Six Principles for AI Infrastructure
Across every engagement, regardless of cloud platform or industry, six architecture principles remain constant. These principles are independent of any specific vendor. They describe properties that any well-architected AI deployment should have. The platform choices come later; the principles come first.
Identity Inheritance
Authenticate through the identity system the organization already manages.
Boundary Containment
Keep data processing within the existing cloud boundary.
Permission-Aware Retrieval
Respect existing access controls when retrieving organizational knowledge.
Human Approval
Surface review requests inside existing collaboration tools.
Native Observability
Monitor and audit through the cloud platform's own tools.
Single Agreement Chain
Cover all components under one set of compliance agreements.
Identity Inheritance
AI services should authenticate users through the identity system the organization already manages. When staff sign in with the credentials they use for email and file access, no new accounts are created, no new passwords are issued, and no parallel directory needs to be maintained. The identity provider the organization has already invested in (including group memberships, department assignments, and access policies) extends to the AI layer without duplication.
For product companies building for external users, this principle shifts from inheritance to architecture. These organizations do not inherit an existing identity system; they design one. The foundational commitment remains the same: identity is a first-class concern, and the platform should support federation so that customers can authenticate with their own organizational credentials rather than creating yet another account.
The operational benefit compounds over the life of the deployment. Every employee onboarding, role change, and departure is already reflected in the organization's identity provider. When the AI layer inherits from that source, access changes propagate automatically. There is no secondary system to update, no synchronization lag to worry about, and no orphaned AI accounts to audit.
Boundary Containment
Data processed by AI services should remain within the organization's existing cloud boundary. When AI models are deployed inside the same environment that hosts the organization's other services, the security perimeter does not expand. The same network controls, the same encryption standards, the same monitoring tools, and the same vendor agreements cover everything.
This applies to model inference specifically. The question is where the AI model reads and processes your data. If inference happens outside your cloud boundary, the compliance surface area grows because data is now being processed under a different set of controls, potentially under a different vendor agreement. Keeping inference inside the boundary is the simplest way to avoid that expansion.
Boundary containment also simplifies network architecture. When AI services run inside the same cloud environment, communication between the AI layer and the organization's data stores happens over internal networks. There is no need to configure cross-cloud connectivity, manage external API endpoints for sensitive data, or evaluate the security of data in transit between two different providers.
Permission-Aware Retrieval
When AI services draw on organizational knowledge to answer questions, the retrieval layer must respect the same access controls that govern the source documents. This means filtering happens before the AI model sees any content. The model never receives documents the user is not authorized to access. This is enforced at the retrieval layer, applied before generation, and it is non-negotiable.
The access rules already exist. File systems, document repositories, and collaboration platforms already define who can see what. The AI retrieval layer inherits those rules rather than rebuilding them. An advisory team member asking a question should receive answers drawn from advisory documents, not from compliance-only materials or executive correspondence. The permissions that already govern document access become the permissions that govern AI-assisted retrieval.
Human Approval in Existing Workflows
AI services that take consequential actions or produce externally-facing outputs should include human review. That review should happen inside the tools people already use for communication and collaboration. When an approval request appears in the same interface where a person manages the rest of their work, response times improve and the AI capability integrates into existing operational rhythm rather than creating a separate workflow.
This principle applies selectively. Internal research queries may not need approval. A draft email to a regulator almost certainly does. The architecture should make it straightforward to add approval checkpoints at any step, surfacing them where people already work.
Native Observability
Monitoring, logging, and audit trails for AI services should run through the cloud platform's own tools. This eliminates the need for external observability vendors in the production data path. Audit logs, access records, cost tracking, and performance monitoring all flow through the same system the IT team already manages.
For regulated industries, this matters significantly. A single audit stream covering AI services and all other infrastructure simplifies every compliance review. When the auditor asks how AI service access is logged, the answer is the same system that logs access to everything else. One dashboard, one set of alert policies, one retention configuration.
Single Agreement Chain
All components of the AI architecture should be covered by one set of compliance agreements with one cloud provider. Each additional vendor in the data path adds a Business Associate Agreement (for HIPAA), a Data Processing Agreement (for GDPR), or a vendor security addendum (for SOC2). Consolidating onto a single provider's boundary reduces the agreement chain to one relationship.
This is particularly relevant during procurement and compliance reviews. When every component of the AI system is covered by the same enterprise agreement, the legal and compliance teams evaluate one vendor relationship instead of three or four. The operational cost of managing vendor agreements is real, and reducing it frees time for the work that actually matters.
Mapping Principles to Platforms
The six principles are platform-independent, but their implementation is platform-specific. The question that determines which platform to use is diagnostic: where does your organization already govern identity, access, and collaboration?
For organizations running Microsoft 365, the answer points to Azure. Entra ID already manages user identity. SharePoint and Teams already host documents and collaboration. Azure OpenAI provides AI model inference within the Azure boundary. For organizations on Google Workspace, the answer points to Google Cloud. Cloud Identity manages users and groups. Google Drive hosts documents with granular sharing permissions. Vertex AI handles model inference within the GCP boundary. For software companies building AI-powered products, the decision driver shifts. AWS, with Bedrock providing multi-model access under a single agreement and a deep ecosystem for multi-tenant SaaS infrastructure, often fits this scenario well.
| Principle | Azure | Google Cloud | AWS |
|---|---|---|---|
| Identity Inheritance | Entra ID | Cloud Identity | Cognito + federation |
| Boundary Containment | Azure OpenAI | Vertex AI | Bedrock |
| Permission-Aware Retrieval | Azure AI Search + Entra groups | Vertex AI Search + Drive ACLs | Tenant-scoped search index |
| Human Approval | Teams | Google Chat | Product UI + webhooks |
| Native Observability | Azure Monitor | Cloud Logging | CloudWatch + X-Ray |
| Single Agreement Chain | Azure EA + BAA | GCP EA + DPA | AWS EA + BAA |
The principles are constant. The implementation adapts to the environment the client already operates. The sections that follow walk through three client scenarios to show how this works in practice.
Regulated Financial Services Firm
Microsoft 365 / AzureA 300-person financial services firm operating under SEC reporting requirements and SOC2 compliance. The firm manages client portfolios and produces regulatory filings across four departments: advisory, compliance, operations, and finance. The firm runs Microsoft 365 across the entire organization. Entra ID governs user identity with department-level security groups and conditional access policies. Documents live in SharePoint, organized by department with granular permissions.
The firm wanted an AI-powered knowledge system that would allow staff to ask questions and receive answers drawn from internal documents. A compliance officer should be able to query the system about a regulatory requirement and receive an answer grounded in the firm's own compliance memos and regulatory filings. The system needed to be accurate, auditable, and respectful of the access boundaries that already govern who sees which documents.
The Design
Identity
Staff authenticate to the AI system with their existing Entra ID credentials. The same single sign-on that opens Outlook and Teams opens the knowledge system. Security group memberships determine what the user can access. No separate user directory. No additional credentials.
Boundary
AI model inference runs through Azure OpenAI, deployed within the firm's Azure tenant. When a staff member submits a query, the processing happens inside the same cloud boundary that hosts their email and documents. Client portfolio data, regulatory filings, and internal correspondence never leave that boundary for AI processing.
Permissions
SharePoint document permissions map directly to the retrieval layer. When a user queries the system, the retrieval engine filters results based on that user's Entra ID group memberships before the AI model sees any content. Advisory staff receive answers drawn from research reports and client documents. Compliance staff receive answers drawn from regulatory filings and audit materials. A compliance officer querying the system will never receive content from advisory client correspondence they are not authorized to view, because the AI model never receives that content in the first place.
Human Approval
When the AI system generates a client-facing summary or a draft communication that will leave the firm, it routes for review through a Teams notification. The assigned reviewer receives the draft in Teams, reviews it, and approves or requests changes within the same interface they use for all other collaboration. The approval is logged with timestamp, reviewer identity, and the content that was approved.
Observability
All AI interactions are logged through Azure Monitor. The IT team can see who queried the system, what documents were retrieved, what the AI model generated, and whether the output was approved. Access audit trails, query logs, and cost tracking all flow through the same monitoring infrastructure the IT team already uses.
Agreement Chain
One Azure enterprise agreement with BAA addendum covers compute, AI models, storage, and monitoring. When the compliance team prepares for their SOC2 audit, they point to the same vendor relationship and the same set of controls that cover their email and file storage.
What the firm's compliance team sees is an AI capability that lives inside the same security boundary as their email. Same audit stream. Same vendor relationship. Same set of controls they have already documented and had audited. When a new regulatory requirement emerges or an auditor requests evidence of AI governance, the firm's response draws on the same documentation, the same dashboards, and the same vendor contacts they have used for years. The AI capability is new; the governance infrastructure is not.
Technology Company
Google Workspace / GCPA 150-person technology company building data analytics tools. Engineering-heavy, with product, design, and customer success teams rounding out the organization. The company operates entirely on Google Workspace. Drive hosts engineering documentation, product specifications, and customer feedback repositories. Google Chat and Meet handle daily collaboration.
The company wanted an internal AI assistant that helps teams find information across engineering documentation, product specs, customer feedback logs, and internal knowledge bases. The goal was reducing the time teams spend searching for information that exists somewhere in the organization but is hard to locate.
The Design
Identity
Staff authenticate with their Google Workspace credentials. Google Cloud Identity maps group memberships from Google Groups into the AI system's access layer. No separate accounts.
Boundary
Vertex AI handles model inference within the company's GCP project. Queries are processed inside the same cloud environment that hosts their other development infrastructure.
Permissions
Google Drive sharing permissions flow into the retrieval layer. Engineering documents shared with engineering groups are retrievable by engineers. Customer feedback shared with product and customer success groups is retrievable by those teams. A product manager querying the system receives answers drawn from product specs and customer feedback, not from engineering-only architecture documents they do not have Drive access to.
Human Approval
Approvals and notifications route through Google Chat via Chat apps integration. When the AI assistant produces a customer-facing summary or a response that needs review, the notification appears in Chat alongside the team's other conversations.
Observability
Cloud Logging and Cloud Monitoring handle audit trails, access logs, and performance metrics. The engineering team, already proficient with GCP's monitoring tools, manages AI observability through the same dashboards they use for their product infrastructure.
Agreement Chain
One GCP enterprise agreement with DPA covers the entire stack.
The structural parallel to the financial services scenario is the point. Different cloud, different identity provider, different collaboration surface. Identical principles, identical architecture pattern, equivalent security and governance posture. The methodology adapts to the client's existing environment; the principles do not change. The principles produce the right answer for each context precisely because they are independent of any particular platform.
Health Tech SaaS Platform
AWS / BedrockA health tech company building a clinical documentation platform for small and mid-size medical clinics. The platform ingests clinical notes, lab results, and care plans, then provides an AI-powered answer engine for clinical staff. Each clinic is a separate tenant on the platform. The company serves over 200 clinic customers, each requiring HIPAA compliance and a Business Associate Agreement. Data isolation between tenants is non-negotiable.
This scenario differs fundamentally from the previous two. This company is building a product for others. The infrastructure decision is driven by product requirements (multi-model capability, multi-tenant architecture, and scale economics) rather than by an existing internal collaboration environment.
Why AWS Fits This Scenario
The platform uses different AI models for different capabilities: one model for clinical reasoning over complex patient histories, another for lightweight summarization of lab results, another for document embedding and search. Bedrock provides access to multiple model providers through a single service, all within the AWS boundary, all covered by one BAA. This model flexibility is a product-level strategic advantage.
AWS also provides the deepest ecosystem for multi-tenant SaaS infrastructure. Container orchestration, event-driven processing, per-tenant data partitioning: these patterns are well-established on AWS with extensive reference architectures. Scale economics matter for a platform onboarding hundreds of clinics. Adding a new clinic tenant should be a configuration operation, not an engineering project.
The Design
Identity
Cognito user pools provide authentication with tenant isolation, ensuring that each clinic's staff are scoped to their own tenant. The platform also supports federation: clinic customers who run their own Microsoft 365 or Google Workspace environments can authenticate with their existing organizational credentials through SAML or OIDC federation.
Boundary
All AI processing runs through Bedrock within the company's AWS account. Protected health information never leaves the platform's cloud boundary for AI inference. Each clinic's data is partitioned at the storage layer, and tenant scoping is enforced throughout the processing pipeline.
Permission-Aware Retrieval
Every document chunk in the knowledge index is tagged with a tenant identifier. Retrieval is filtered at query time by tenant ID before any content reaches the AI model. Clinic A's patient records never enter Clinic B's AI context. The model cannot surface information it never receives.
Human Approval
Approval workflows surface in the product's own user interface. When a clinical summary requires physician review before it is finalized in the patient record, the notification appears within the platform. Webhook and email notification options are available for clinic staff who need alerts outside the platform.
Observability
CloudWatch and X-Ray handle logging, tracing, and monitoring. Per-tenant usage metrics support both compliance auditing (demonstrating data isolation and access controls to each clinic's satisfaction) and customer billing (metering AI usage per clinic for pricing purposes).
Agreement Chain
The AWS BAA covers Bedrock, S3, compute, and all supporting services. The health tech company then extends its own BAA downstream to each clinic customer, creating a clean chain from cloud provider to platform to end customer.
The connecting thread across all three scenarios becomes visible here. This health tech company's clinic customers are organizations like those described in the previous two sections: small and mid-size operations running Microsoft 365 or Google Workspace, whose staff authenticate through an organizational identity provider. When the clinical documentation platform supports federation, it enables those clinic customers to extend their existing identity governance to yet another service. The principles cascade: thoughtful AI architecture at the platform level enables sound AI governance at the customer level.
The Decision Framework
The three scenarios illustrate different paths to the same destination: AI services that inherit existing governance rather than creating parallel governance. The path an organization takes depends on two questions.
First: where does your organization already govern identity and collaboration? For most enterprises, this question has a clear answer. The identity provider and collaboration platform the organization has already invested in defines the natural home for AI services. Following the identity means following the existing governance, security controls, and vendor relationships.
Second: are you operating AI for your own team, or building AI into a product for others? Internal AI capabilities should follow the identity. Product companies building AI for external users face a different calculus. Model flexibility, multi-tenancy infrastructure, and scale economics may become the primary drivers.
Hybrid scenarios deserve acknowledgment. Some organizations run infrastructure on one cloud platform while governing identity through another. The principle-based approach accommodates this. Identity may live in one environment while compute lives in another. Federation bridges the gap, allowing users to authenticate through their primary identity provider while AI services run in a different cloud. The key is being intentional about which environment governs what, and ensuring that the AI layer inherits identity from the authoritative source.
What This Means for Compliance
The six architecture principles were not designed around compliance requirements, but they simplify compliance as a natural consequence. Fewer vendors, fewer agreement chains, fewer audit streams, and fewer data boundary crossings make every compliance regime easier to satisfy.
Health Data Protection
HIPAA requires covered entities and their business associates to safeguard protected health information. A single BAA covering all AI components (model inference, storage, compute, monitoring) eliminates the need to negotiate separate agreements with multiple vendors handling PHI. Boundary containment ensures that protected health information is processed within the covered cloud environment. Native observability provides the access logging that HIPAA's audit requirements demand. Permission-aware retrieval enforces the minimum necessary standard at the data layer: users and AI models access only the information required for the task at hand.
Security Controls
SOC2 examines security controls, availability, processing integrity, confidentiality, and privacy across an organization's service providers. When AI services run within the same cloud boundary as other enterprise services, the security controls are documented within one platform. Access reviews are centralized. Change management is tracked through native tooling. The evidence collection process for SOC2 audits is substantially simpler when the auditor evaluates one environment rather than two or three.
Data Residency and Rights
GDPR imposes requirements around data residency, purpose limitation, data subject rights, and processor agreements. Cloud region selection enforces data residency at the infrastructure level. Deletion capabilities native to the platform support data subject access and erasure requests. A single Data Processing Agreement covers the entire AI stack. Permission-aware retrieval supports purpose limitation by ensuring that data retrieved for AI processing is scoped to what the user is authorized to access and what the query requires.
Financial Regulation
SEC and financial services regulations require audit trails for communications, documentation of decision-making processes, and in many cases human review of client-facing outputs. Native observability provides comprehensive audit trails for all AI-generated content. Human approval workflows, surfaced through the organization's existing collaboration tools, create documented review chains for any AI output that reaches clients or regulators. Retention policies enforced through platform-native tools ensure that AI interaction records are preserved according to regulatory requirements.
Government and Public Sector
Government and public sector requirements, including FedRAMP authorization, are available across all three major cloud platforms. A single-cloud deployment simplifies the authorization boundary by containing all AI components within one FedRAMP-authorized environment. The principle of boundary containment aligns directly with the FedRAMP requirement to define and defend a clear system boundary. For agencies and contractors subject to NIST 800-171 or CMMC requirements, the same logic applies: consolidating AI services within an already-authorized cloud boundary avoids the need to extend the authorization boundary to cover additional providers.
Tradeoffs We Have Considered
A complete accounting of this approach requires acknowledging the tradeoffs involved. We have considered these and believe the balance favors the principled approach described in this paper, but reasonable organizations may weigh them differently.
Cloud cost variation
Azure and GCP compute can cost marginally more than equivalent AWS resources at certain scales. For most mid-market organizations, the governance simplification (fewer vendor agreements, centralized monitoring, unified identity management) outweighs the per-unit cost difference. We help clients evaluate this on a case-by-case basis, and for some workloads the cost differential is negligible. For others, it is measurable and worth discussing.
Model availability
AI models are not uniformly available across all cloud platforms. An organization with a strong preference for a specific model provider should verify that the model is available within their chosen cloud boundary. In practice, the major cloud platforms have expanded model access significantly, and most mid-market use cases are well served by the models available on any of the three platforms. In some cases, the AI model API may be the one component that crosses the cloud boundary, and the compliance implications of that specific boundary crossing should be evaluated explicitly rather than ignored.
Migration complexity
Organizations already running workloads on one cloud platform who would benefit from a different cloud for AI services may face migration costs. The principled approach does not require moving everything. It requires being intentional about where the AI layer lives relative to identity and governance. In some cases, federation and cross-cloud networking allow the AI layer to live in one environment while identity remains in another. The cost of that bridge is real, and it should be evaluated against the cost of maintaining fully parallel governance.
When speed and governance pull in different directions
Development teams often have deep expertise with one cloud platform regardless of where organizational identity and governance live. Moving quickly on a platform the team knows well has real value, particularly for proof-of-concept work and initial prototypes. The question worth asking is whether the operational and compliance cost of maintaining a second environment outweighs the initial speed advantage. In our experience, governance simplification compounds over time. A prototype that runs on a different cloud from the organization's identity provider will eventually need to be reconciled with that identity provider, and the cost of reconciliation grows as the system matures. Starting in the right place, even if the first sprint is slightly slower, tends to produce a lower total cost of ownership.
Vendor concentration risk
Consolidating AI services onto a single cloud provider concentrates operational dependency. If that provider experiences an outage, AI services are affected alongside email, file storage, and collaboration. This is a genuine consideration. In practice, most mid-market organizations have already accepted this concentration for their core productivity infrastructure. Adding AI services to the same environment does not materially change the risk profile; it extends an existing dependency rather than creating a new one. Organizations for whom vendor diversification is a priority can design accordingly, and the principled approach accommodates multi-cloud architectures through federation.
These are real tensions, and we do not dismiss them. The principled approach described in this paper is the result of having navigated these tradeoffs across multiple engagements and having observed, consistently, that aligning AI infrastructure with existing governance produces better outcomes over the life of the deployment.
Want to see this framework applied to your infrastructure?
We evaluate your existing cloud environment, identity architecture, and compliance requirements — then map a path to AI services that fit where you already operate.