AI Game

Oct 4, 2024

10 min read

225

Few places on earth could inspire a spontaneous stream of poetry like the toilet seats of the now-defunct "H4" IIT Bombay. The four walls of the 4ft x 10ft x 6ft toilet cubicles were adorned with thoughtful literary pieces which have forever been lost to the annals of time (no pun intended).

They were also, in some way, portals into the twisted psyche of India's finest. My favorite quote, which I'm pretty sure is the guiding light of most "path of least resistance" following engineers, was:

"If a problem has a solution, why solve it? If a problem does not have a solution, why solve it?"

A common criticism I encounter almost every day is why, with an army of engineers enough to storm Sparta and trample Leonidas into oblivion, India hasn't produced the next ChatGPT or DeepSeek (sorry Aravind, you've been lost to the US now).

The answer, I believe, is a perception that the effort is simply not worth it. Or at least that's what Mr. Altman would like us to believe when he asked us "to try anyway." To inspire 1-2 of my fellow brethren to look beyond simple chatbots and agents, I've tried to structure this note as close as possible to the script of a game. (Also made a fun GPT here: Revenge of the Dev)

But before we start, a very basic snapshot of the fabled "AI Stack":

With the exception of 'AI Middleware', which I'll cover separately in a follow-up blog, the objective of this blog is to understand how a typical gameplay can vary depending on where you spawn on the AI Map, specifically:

Reward Mechanics: Pricing, volume, scale-up curve
Cost Mechanics: Dev hours and Infra cost to launch vs. Margin profile in steady state
Typical Challenges: Procurement, regulatory, adoption, etc.
Common Strats: adopted by successful players

I. Enterprise / Vertical AI

Player Profile: These organizations are hungry for big, important problems, because the more critical and widespread the issue, the more value and scale they can capture. But they face real obstacles:

Access to high-quality data is limited by strict privacy laws, forcing them to rely on synthetic data and federated learning
Distribution depends heavily on building trust and securing deep integrations with systems like EHRs, ERPs, and CRMs
Regulation across sectors such as healthcare, finance, and enterprise software adds time and cost to every release
Long procurement and security reviews often slow down enterprise adoption
Dependence on cloud providers for APIs and GPUs creates pricing uncertainty and reliability challenges that can squeeze margins and delay deployments

Reward Mechanics [Avg Score: $40M / year by year 5]

Pricing: $30 – $200 / user per month
Volume: 10-120K users from 15-60 enterprise clients

Cost Mechanic [15-25K dev hours @$2-4M for MVP, 20% Y5 EBITDA]

MVP build: 15 - 25K dev hours / $2 – 4 M budget
- Team: 8 – 25 engineers (3 – 7 yrs avg experience)
- Comp: $150 k – $250 k TC (US); $90 k – $150 k (offshore)
- Timeline: MVP in 3 – 6 mo; Enterprise GA in 12 – 18 mo
Gross Margin: 60 – 80 % | EBITDA: 10 – 30 % at scale
- Cloud Compute: 15%
- R&D: 25%
- SG&A: 40%
- Middleware, Licenses & Compliance: 5%

Strats

Consumer AI Platform [Perplexity]
- Product differentiation: General-purpose conversational search/assistant engine with multimodal UX (chat, voice, vision) and “answer engine” capabilities.
- GTM: Freemium onboarding → user growth via web + mobile → “Pro” subscription tier and API access to expand reach.
- Revenue model: Low-ARPU subscriptions ($20/mo) + ads + premium API/enterprise licenses
- Current scale (2024-25): ~22 M monthly active users; ~US$80 M run-rate revenue end of 2024.
Vertical AI Platforms [Abridge, Viz.ai, PathAI]
- Product differentiation: Deep domain-specific models fine-tuned on proprietary data (radiology, pathology, claims, contact-center transcripts) with embedded workflow automation.
- GTM: Top-down enterprise sales in regulated verticals; land lighthouse logos → expand by department → scale via renewals.
- Revenue model: High-ARPU SaaS ($100–$500 / seat / month) or per-deployment enterprise licenses; 2- to 3-year contracts.
- Current scale: Viz.ai ~$40 M ARR (2023 est.); PathAI ~$45 M ARR; Cresta AI ~$35 M ARR.
Horizontal AI Copilots [Notion AI, Jasper]
- Product differentiation: Productivity and marketing copilots that integrate across suites (Docs, CRM, email) via plug-ins and API connectors; model-agnostic orchestration layer.
- GTM: Bottom-up distribution through app stores, Chrome extensions, and workspace embeds; strong SSO and API integrations drive seat expansion.
- Revenue model: Mid-ARPU ($30–$60 / seat / month) subscription; usage-based upsells and enterprise licensing.
- Current scale: Jasper ~$45 M ARR; Notion AI ~$30 M ARR; Rewind ~$20 M ARR (2025 est.).

II. AI Models

Player Profile: Aim to build the smartest, fastest, cheapest intelligence (foundational model + Inference stack; measured in tokens), while dealing with:

Capital Intensity: Each frontier-model generation now exceeds $500M capex + $200M opex. Even inference scaling requires >$50M annual GPU leasing commitments
Talent Scarcity & Retention: Shortage of senior ML researchers, distributed-systems engineers, and alignment specialists; elite hires cost $5–10M+ annually. Team continuity directly affects model reproducibility and tuning efficiency.

Reward Mechanics [$200M+ ARR by Year 5 | $1-1.5B+ ARR for frontier models]

Pricing: $0.005 – $0.10 per 1K tokens (tiered by model and context)
Volume: Driven by token throughput across enterprise and consumer workloads
- 45% Enterprise SaaS / B2B infra (10 - 100B tokens / month)
- 35% Consumer apps (100M+ DAUs)
- 15% Vertical AI (5-50 verticalized deployments)
- 5% Government / Sovereign ($2-10M contracts)

Cost Mechanic

[Standard Model: 25-50K Dev hours | $15-25M upfront cost | 10-20% Y5 EBITDA]

[Frontier Model: 150–300K dev hours | $250–500 M upfront cost | 10-20% Y5 EBITDA]

MVP Build: 25-50K dev hours / $15-25M upfront cost
- Dev Cost: ~25 – 50K Dev hours / $5-10M budget (Frontier: ~150 – 300K dev hours / $100–200 M budget)
  - Team: 20 – 60 ML/AI engineers & researchers (frontier: 150 – 400)
  - Compensation: ~$300K – $450K / yr for senior ML engineers (frontier: $400K – $700K ); elite researchers can exceed $10M+ / yr
  - Timeline: Core model + inference API: ~9–12 months; Production-grade, multi-cloud deployment with orchestration & observability: 18–24 months. Breakeven Horizon: ~30–36 months post-launch
- Training clusters & compute: ~$8 – 15 M compute + power + infra (frontier: $150 – 300 M)
  - Cluster Scale: 1–2K GPUs @ $6–10 per GPU-hour (H100-class, InfiniBand interconnected); utilization <50% early in training.
- Other factors:
  - Data Curation & Annotation: ~$2 – 4M, combining human-labeled and synthetic data generation; quality drives downstream fine-tuning efficiency.
  - Model Evaluation, Safety & Fine-tune Cycles: ~$1 – 2M per cycle; includes RLHF runs, safety dataset refresh, and post-training quantization/alignment.
  - Networking & cooling add 15–25% to infra; data licensing (text/code corpora) can add $1–3M annually.
Gross Margin: 55 – 75% | EBITDA Margin: 10 – 30% at scale (≈20% steady-state for large players)
- Inference: Variable cost tied to tokens/sec/GPU; margin expansion depends on utilization and model throughput optimization (e.g., kernel fusion, quantization).
- Cloud Compute: 30 – 40% of COGS (GPU leases, egress, orchestration).
- R&D: 25% of revenue (ongoing model optimization, eval, and safety).
- SG&A: 25% (enterprise sales, developer partnerships, and support).
- Licensing & Compliance: 5% (open-weight governance, IP audits, model registry management)

Strats:

Frontier Model Platforms [OpenAI / Anthropic / DeepSeek]
- Product differentiation: Large-scale multimodal foundation models (GPT-4o, Claude 3, DeepSeek-R1) optimized for reasoning, context length, and tool-use integration. Moats built on proprietary training data, inference efficiency, and reinforcement-learning safety pipelines.
- GTM: Platform-first motion via API integrations and cloud alliances (Azure OpenAI, Amazon Bedrock). Parallel consumer funnel through ChatGPT/Claude apps to drive data feedback loops and enterprise upsells.
- Revenue model: Tiered API usage ($0.005–$0.10 / 1K tokens) and enterprise subscriptions ($30–$100 / seat / month). Gradual shift toward custom fine-tunes and “private GPT” deployments with recurring usage-based billing.
- Current scale: OpenAI ≈ $3.5 B ARR (2025 est.); Anthropic ≈ $850 M; DeepSeek ≈ $300 M.
Local Language Models [Sarvam AI / Krutrim / Baichuan / MiniMax]
- Product differentiation: Lightweight, cost-efficient LLMs trained on regional corpora (Indic, Chinese, Arabic) for culturally grounded understanding, low-latency inference, and edge deployment. Focus on multilingual ASR, text-to-speech, and domain-specific comprehension.
- GTM: B2G + enterprise partnerships in domestic markets (banks, telecoms, ed-tech, public sector). Developer ecosystem built via open-weight releases and cloud partnerships (AWS India, Alibaba Cloud, Tencent).
- Revenue model: Hybrid licensing — API usage for enterprises + on-prem weights / royalties for sovereign clients. Pricing typically $0.0005–$0.005 / 1K tokens or fixed-fee annual contracts ($0.5–2 M / deployment).
- Current scale: Sarvam AI ≈ $15 M ARR (2025 est.); Krutrim ≈ $25 M; Baichuan ≈ $40 M.

III. AI Cloud:

Player Profile: AI cloud platforms aim to deliver the fastest, most reliable, and most cost-efficient compute infrastructure for training and running large-scale AI systems, but they operate under intense technical and financial pressure:

Financing & Capex Load: $100B+ annual AI capex across hyperscalers (AWS, Azure, GCP) strains balance sheets; financed through lease liabilities and bonds.
Power & Capacity Constraints: GPU availability and grid access (1–2% data-center vacancy in major markets) create multi-quarter delays and premium pricing
Utilization Efficiency: Inference margins hinge on scheduler optimization and GPU goodput
Regulation & Compliance: EU AI Act (2025) imposes GPAI obligations (documentation, transparency). Sovereign cloud and TEE-backed VM requirements drive infra duplication

Reward Mechanics

[Cloud Specialists: $6–10B ARR in 2025 → $15–25B by Year 5]

[Big 3: $30–35B ARR (AI infra) in 2025 → $90–120B by Year 5]

Pricing:
- GPU time (training): $6–$12 per GPU-hour (H100 class) depending on region and commitment tier.
- Serverless (inference): ~$0.20 per 1M requests + $0.0000167 per GB-s (AWS Lambda baseline).
- Managed LLM APIs: $0.005–$0.10 per 1K tokens (Vertex, Azure OpenAI, Bedrock).
Volume:
- 45% Training (reserved GPU capacity)
- 40% Inference (token/request billing)
- 10% Storage/networking
- 5% Support & managed services

Cost Mechanic

[Cloud specialists 200–400K dev hours @$0.3–0.7B | $5–12B CAPEX over 3Y | 15–20% Y5 EBITDA]

[Big 3: 0.8–1.2M dev hours @ $4–7B| $90–135B CAPEX over 3Y | 20–30% Y5 EBITDA]

Dev/Engg: 200–400K dev hours @$0.3–0.7B [Big 3: 0.8–1.2M dev hours @ $4–7B]
- Headcount: ~300–700 FTEs (Big 3: 2500-5000) across infra/orchestration/billing; heavier vendor reliance
- Comp & tools: ~$100–$250M/yr (Big 3: $1.5-2.5B)
- Timeline: 6–12 mo MVP; 12–18 mo to orchestrate multi-site clusters
Infra (capex/leases/debt): $5–12B cumulative over 3 yrs [Big 3: $90-135B]
- Financing via equipment loans, leases, and take-or-pay offtakes; shells often colocation/leaseback
- Sensitivity: financing cost and residual values drive volatility; utilization gaps compress GM
GM: 45-55% (Big 3: 65%) | EBITDA: 15-20% (Big 3: 25%)
- COGS: 45–55% (Big 3: 30-35%): Leases 30–40%; Power & cooling 10–12%; Network 5–8%.
- R&D/platform 8–12%
- SG&A: 10-15%.
- Breakeven & inflection: EBITDA breakeven once contracted utilization ≥60–65%; mix-shift to longer-tenor leases reduces financing drag.
Unit levers: pre-sold capacity, cluster density (NVLink/IB fabric), power pricing, and debt terms

Strats

IaaS Cloud Platforms [AWS / Google Cloud / CoreWeave]
- Product differentiation: Full-stack infrastructure providers selling compute, networking, and storage for model training and inference. Strength lies in GPU density, scheduling efficiency, and global region availability. AWS dominates via Trainium/Inferentia and Bedrock; Google via TPU v5p/v6e and A3 clusters; CoreWeave through faster GPU delivery and flexible contracts.
- GTM: Capacity-first motion — customers reserve GPU clusters (Capacity Blocks, Committed Use, take-or-pay contracts) or rent on-demand for short bursts. Partnerships with model labs and sovereign clients drive multi-year utilization locks.
- Revenue model: GPU-hour and storage pricing (training), plus token-based billing for managed inference. Upside from reserved throughput and premium data-residency features.
- Scale: AWS ≈ $15 B AI-infra ARR (2025 est.); GCP ≈ $7 B; CoreWeave ≈ $2 B. Aggregate AI-infra revenue ≈ $30–35 B, growing 45–55 % YoY.
PaaS AI Clouds [Azure / Oracle / Vertex AI]
- Product differentiation: Platform-as-a-service layers that abstract raw compute behind managed APIs, model endpoints, and orchestration tools. Azure leads with integrated OpenAI Service and PTUs; Google’s Vertex AI offers auto-deployment and token metering; Oracle extends to enterprise GPU hosting with built-in governance.
- GTM: Enterprise-first via cloud bundles and developer APIs. Focus on reserved throughput, fine-tuning pipelines, and sovereign-cloud compliance to lock in high-value accounts.
- Revenue model: Token-metered inference, per-seat or per-deployment enterprise licenses, and managed workflow subscriptions. Lower churn and higher attach to productivity and data services.
- Scale: Azure ≈ $10 B AI-infra ARR (2025 est.); Vertex ≈ $6 B; Oracle ≈ $1 B. AI workloads are fastest-growing segment within cloud portfolios (30 %+ YoY).

IV. AI Chips

Player Profile: AI chip companies focus on building the fastest, most efficient processors for training and running large-scale AI systems, but they operate in one of the most resource-constrained segments of the entire stack:

TSMC / Foundry Capacity Allocation: CoWoS-L and HBM packaging are gating factors; NVIDIA controls ~60–70% of 2025 TSMC CoWoS output. Startups and non-hyperscaler customers face multi-quarter delays or must prepay for wafer slots.
HBM & Memory Bottlenecks: HBM3E/4 supply dominated by Samsung, SK Hynix, and Micron; long qualification cycles limit ramp flexibility. Memory capacity now the defining performance bottleneck for large training chips.
Verification & Yield Risks: First-silicon yield <60% common for new architectures; respins double NRE exposure
Regulatory Exposure: BIS export controls (2022–2025) and CHIPS Act guardrails limit both customer base and expansion geography
Talent & EDA Bottlenecks: Senior ASIC designers and DV engineers are scarce; EDA license cost escalation adds capex strain
Capital Intensity & Time-to-Market: $100–300M+ NRE per chip; 24+ month cycles; survival requires deep-pocketed partners or hyperscaler co-funding

Revenue Mechanic

[Nvidia: $100B+ AI hardware sales + $150B+ ecosystem by Y10]

Pricing: Datacenter AI accelerators list at $15K–$40K+ per unit
Volume & Customer Mix:
- Hyperscaler orders: 10K–100K+ units / quarter, staged across 2–4 quarters.
- Increasing share of software + services (NVIDIA DGX Cloud, AI Enterprise) → blended margin uplift

[Custom ASIC: $250M–$1B potential ARR by Year 5 depending on design wins]

Pricing: Custom AI inference/training ASICs: $5K–$25K per chip
Volume:
- Launch volumes: 5K–15K units per year (post-tapeout Year 1) → 50K+ at maturity.
- Customers: Hyperscalers, AI Labs, OEM / Edge integrators
- Long-term contracts secure allocation (often prepaid)

Cost Mechanic

[Nvidia: 100K–200K dev hours | $200–500M architecture NRE | 25–30% Y5 EBITDA]

[Custom ASIC: 50–100K dev hours | $100–300M total NRE | 20% Y5 EBITDA]

MVP Build:
- Architecture + verification cycle: 24–30 months (custom ASIC: 12-18 Months) per major GPU generation.
- Headcount: 1,000–3,000 engineers (custom ASIC: 30-120 engineers) globally (architecture, DV, firmware, packaging, SW).
- Comp Range: $250K–$500K median TC; top architects $1M+.
- Verification & FW: 50–60% of total dev hours.
- Tools & IP Costs: EDA toolchain: $200–400K / seat / year (enterprise licenses); Proprietary IP: $50–100M+ cumulative license amortization
- Total architecture NRE (EDA + IP + verification + fab bring-up): $200–500M+
Gross Margin: 65–75% | EBITDA Margin: 25–35% at scale
- Foundry & packaging: 40% (HBM + CoWoS packaging contributes 20–25% of total COGS)
- R&D: 25%
- SG&A: 20%
- IP/licensing: 10%
- EBITDA Margin: 25–30% steady-state.
Operating Leverage: GPU design amortized across 2–3-year lifecycles (Hopper → Blackwell → Rubin)

Strats:

“NVIDIA Circular Financing” Loop
- Mechanic: A closed-loop system that sustains GPU demand and capacity utilization across NVIDIA ↔ cloud providers ↔ AI model companies.
  - NVIDIA commits cloud spend or service credits with select partners (e.g., CoreWeave).
  - CoreWeave and similar GPU lessors raise multi-billion-dollar debt and structured financing to purchase NVIDIA GPUs.
  - Hyperscalers and AI labs (e.g., OpenAI, Anthropic, Stability) sign multi-year take-or-pay capacity contracts with CoreWeave.
  - NVIDIA benefits twice — once on hardware sales and again via cloud ecosystem growth, stabilizing both supply allocation and demand visibility.
- Rationale: Smooths capacity signals and mitigates the “bullwhip” in GPU demand cycles. Creates synthetic elasticity in supply while ensuring long-term allocation priority.
- Risks: High concentration and leverage risk with exposure to a few over-leveraged lessors (CoreWeave, Lambda, Crusoe) could amplify systemic shocks if GPU resale or utilization drops.
Google ↔ Broadcom TPU Partnership
- Structure:
  - Google designs and architects Tensor Processing Units (TPUs) for internal workloads (training and inference).
  - Broadcom co-develops and produces the silicon and packaging (CoWoS/TSMC).
  - Multi-year production and supply-chain collaboration ensures predictable yield and capacity reservation.
- Rationale:
  - Reduces dependence on NVIDIA’s allocation cycles.
  - Enables Google Cloud to offer TPU-as-a-Service under controlled pricing and supply terms.
  - Broadcom benefits from high-margin co-development and stable AI silicon pipeline.
- Risks:
  - Broadcom continues expanding AI-hardware partnerships beyond Google (e.g., Meta, hyperscaler inference accelerators).
  - Google has explored MediaTek for diversification but remains tethered to Broadcom for next-gen TPU nodes (4 nm → 3 nm)