Spotify system design
Style Music Streaming Platform — System Design (Comprehensive .md)
Version: 1.1
Author: Efe Ozkavci
Scope: End-to-end design covering ingestion → storage → delivery → personalization → billing → operations, optimized for global scale, high availability, and low-latency streaming.
1) Goals & Requirements
1.1 Functional Requirements
- Play music on mobile/desktop/web with <150 ms median start latency after play click (post-warm).
- Search & browse: artists, albums, tracks, playlists, podcasts; autocomplete and typo tolerance.
- Library: like/save tracks, create and follow playlists, follow artists.
- Download for offline with license/DRM.
- Recommendations & personalization: home feed, Daily Mix, radio, “because you listened…”, new releases.
- Social: collaborative playlists, friend activity (opt-in), share links.
- Creator pipeline: ingest, validate, transcode multi-bitrate, loudness normalization, fingerprinting, rights.
- Monetization: subscriptions, family/student plans, ads for free tier (audio + display + video).
- Analytics: real-time playback events, engagement funnels, reporting for labels/artists.
- Multi-region: serve globally, seamless failover.
- Compliance: GDPR/CCPA, COPPA exclusion, data residency where required.
1.2 Non-Functional Requirements (NFRs)
- Availability: 99.95% overall; critical path (playback) 99.99% regionally.
- Latency: P50 start <150 ms (after cached auth and metadata), P95 <350 ms.
- Throughput: 20M–100M DAU; peak ~10M concurrent streams; 500k–2M RPS read on metadata/search.
- Durability: Audio objects 11+ nines; metadata RPO ≤ 5 min; RTO ≤ 10 min per region.
- Cost: Optimize egress via CDN + cache; cold tier for long-tail content.
- Security: Zero-trust service mesh, KMS, HSM-backed keys for DRM, regular pen-tests.
- Observability: SLOs, RED/USE metrics, distributed tracing, structured logs with PII minimization.
2) High-Level Architecture
Key idea: immutable audio blobs in object storage; small, hot metadata in distributed DBs; search index for discovery; CDNs for low-latency delivery; Kafka/Flink + feature store for personalization; multi-region active-active for reads; controlled write topology.
3) Data Model (Core)
3.1 Entities (simplified)
- Artist(artist_id, name, aliases[], verified, labels[], created_at, …)
- Album(album_id, title, artist_ids[], release_date, upc, territories[], …)
- Track(track_id, album_id, artist_ids[], isrc, duration_ms, explicit, loudness, preview_url, encodings[], territories[], …)
- Encoding(encoding_id, track_id, codec, bitrate_kbps, container, segment_manifest_uri, drm_scheme, checksum, …)
- Playlist(playlist_id, owner_user_id, title, description, is_collaborative, is_public, track_ids[], followers_count, …)
- User(user_id, handle, email_hash, country, tier, devices[], consent_flags, …)
- Follow(follower_id, followee_id, type{artist|user|playlist}, …)
- PlaybackEvent(event_id, user_id, track_id, position_ms, device, ts, session_id, …)
- AdImpression/Click, Subscription, Entitlement, License
3.2 Storage Choices
- Metadata: Cassandra (wide-column, multi-DC), or Postgres (Citus) where strict relational integrity is needed (billing).
- Search: Elasticsearch/OpenSearch for full-text, facets, synonyms.
- Graph: Follow/relationship edges in Cassandra tables or Neo4j if heavy graph traversal is required.
- KV Cache: Redis for session, feature flags, hot playlist heads.
- Object Storage: S3/GCS/Azure-Blob or in-house; versioned; lifecycle policies.
- Data Lake: Parquet in object store; Lakehouse (Delta/Iceberg) for batch + streaming.
4) Playback Path (Happy Path)
Adaptive Bitrate (ABR): HLS/DASH with multiple bitrates (e.g., 64/96/160/320 kbps AAC/Opus). Player chooses based on bandwidth/CPU/battery.
DRM: Widevine/FairPlay/PlayReady; license minted per session/device; short-lived playback tokens; keys in HSM/KMS.
5) Ingestion Pipeline (Creators & Labels)
- Upload masters (FLAC/WAV) + metadata (DDEX/label feeds or portal).
- Validation: schema, ISRC/UPC checks, profanity, artwork rules.
- Transcode: multi-bitrate encoding (AAC/Opus), loudness normalization (EBU R128).
- Packaging: HLS/DASH manifests; segmenting; DRM key assignment.
- Fingerprinting: acoustic fingerprint for duplicate detection & UGC matching.
- Rights/Territories: availability windows, geo-block rules.
- Publish:
- Write metadata (Cassandra/Postgres).
- Index search (Elasticsearch).
- Store segments/manifests (Object Storage) with lifecycle rules.
- Emit “track_published” event (Kafka) → warm caches, prime CDN, recsys kickers.
Consistency: Eventual; target global visibility < 60 s. Use idempotent upserts with content hashes.
6) Search & Discovery
- Indexing: denormalized docs (artist/album/track), analyzers for multiple locales, synonyms, phonetics, typo tolerance.
- Query: prefix, fuzzy, boosters (popularity, recency).
- Reranking: lightweight ML (LambdaMART/transformer) fed by feature store (user×item embeddings, co-listen, skips).
- Autocomplete: separate prefix index; sub-100 ms P99.
- Safety: explicit/clean versions; territory filtering; policy blocks applied at query & result rendering.
7) Personalization & ML
- Signals: play/skip/complete, dwell, search-to-play, add-to-playlist, follows, shares, device/time features.
- Stream Processing: Kafka → Flink/Spark Structured Streaming → online features (Redis/Cassandra) + offline features (Parquet).
- Embeddings: two-tower (user/item) + sequence models (transformers) for next-track prediction.
- Ranking: candidate generation (ANN: FAISS/ScaNN) → multi-stage rankers.
- Exploration: bandits / epsilon-greedy; long-tail promotion under constraints.
- Experimentation: feature flags, bucketing service, CUPED, sequential tests; guardrails (skip rate, complaint rate).
8) Ads System (Free Tier)
- Eligibility: geo/tier/policy.
- Auction: RTB-like or direct campaigns; pacing & frequency capping per user/device.
- Creative: audio (15/30s), companion display, video (muted/inline).
- Targeting: contexts (mood, activity), coarse segments (privacy-preserving).
- Measurement: impression, quartiles, hovers, clicks; attribution window; brand safety filters.
- Latency: decision in <80 ms P95 on the critical path, otherwise pre-fetch between tracks.
9) Subscriptions, Entitlements, & Billing
- Plans: individual, family, student; country-based pricing; tax/VAT handling.
- Payment: PSP integrations, app store receipts, retries & dunning.
- Entitlement cache: Redis (TTL few minutes) → fallback to DB on miss.
- Device limits: per-tier streaming concurrency; enforce via lease records & heartbeats.
10) Multi-Region Strategy & Outages
- Reads: active-active for APIs (metadata/search/library) with client-side or gateway geo-routing (Anycast + GeoDNS).
- Writes: active-active where conflict-free (Cassandra with LWT where needed). For strongly consistent domains (billing, receipts), use regional primaries with async cross-region and write-forwarding on failover.
- Audio Delivery: CDN first; if regional object store is degraded, signed URLs target alternate region; CDN already has many popular segments cached.
- Control Plane Outage: degrade gracefully:
- Cached manifests & playlists.
- Local device caches and offline.
- Feature flags to remove non-essential calls (social, presence, heavy recommendations) while preserving play.
- Disaster Recovery: RPO ≤ 5 min for metadata via CDC → Kafka → multi-region replicas; regular region failover drills.
11) Caching Strategy
- Edge (CDN): manifests + segments (immutable keys with content hashing); long TTL + cache-busting on republish.
- Client: small lookaside cache (manifests, album artwork), on-disk ring buffer for near-future segments.
- Service: Redis caches for playlist heads, artist top-tracks, entitlement checks; negative caching for 404s (short TTL).
12) API Design (Representative)
12.1 Playback / Manifests
GET /v1/playback/manifest?track_id={id}&codec=aac&br=160
Headers: Authorization: Bearer <access>, X-Device-ID, X-Client-Build
→ 200 { manifest_url, license_url, expires_at, drm_scheme }
12.2 Library
GET /v1/me/tracks?limit=100&cursor=...
PUT /v1/me/tracks/{track_id}
DELETE /v1/me/tracks/{track_id}
12.3 Search
GET /v1/search?q={query}&types=track,artist,album,playlist&limit=20
12.4 Playlists
GET /v1/playlists/{playlist_id}
POST /v1/playlists { title, description, public, collaborative }
POST /v1/playlists/{id}/tracks { track_ids[], position }
12.5 Auth
- OAuth 2.1 / PKCE for user login; short-lived access tokens, DPoP or TLS-bound tokens optional.
- Device authorization flow for TVs/IoT.
13) Offline Mode & DRM
- Encrypted downloads bound to device keys; chunk-level encryption.
- License leases: time-bounded; renewal requires entitlement.
- Revocation: remote wipe on account breach or plan changes.
- Storage: LRU across downloaded items; background revalidation on Wi‑Fi.
14) Security & Privacy
- PII minimization: segregate PII in dedicated store with strict ACLs.
- Encryption: TLS 1.3 everywhere; at-rest with KMS; HSM for DRM keys.
- Service mesh: mTLS, RBAC, OPA for policy enforcement.
- Abuse: bot detection (behavioral), rate limits, token binding to device, watermarking audio for forensic tracing.
- Privacy: consent flags, data subject rights APIs (export/delete), purpose-limited processing, data residency routing.
15) Observability & Operations
- Metrics: RED (Rate, Errors, Duration) per endpoint; playback QoE (startup time, rebuffer ratio, bitrate, failures).
- Tracing: W3C trace-context, 1–10% sampling on happy path, 100% on errors.
- Logging: structured, PII-redacted; drop logs at edge for high-cardinality fields.
- SLOs with error budgets; auto rollbacks on budget burn.
- Runbooks: per service; game days for chaos & regional failover.
- Capacity: autoscale with HPA/KEDA on QPS/lag; pre-warm caches for known peaks (new album drops).
16) Capacity & Cost Sketch
- Object storage: 100M tracks × avg 12 MB per encoded set ≈ 1.2 PB (hot) + replicas; cold tier for long tail.
- CDN egress: dominant cost → optimize via cache-hit ratios, segment reuse across bitrates where feasible.
- Metadata: per-track doc ~2–5 KB; total <1 TB; easy to keep hot in multi-node clusters.
- Search: shard by locale/type; keep top-N popular shards in RAM-heavy nodes.
17) Failure Scenarios & Mitigations
- Region-wide outage → Anycast/GeoDNS reroute; CDN serves from alternate origin; metadata read from healthy region.
- Index lag → fall back to secondary index; show cached “last good” home.
- DRM KMS failure → cached offline licenses + short extension grace; degrade to ad-supported radio only if policy allows.
- Kafka partition outage → ISR/RAFT-backed; producers use idempotent writes; consumers checkpoint with exactly-once semantics (transactions where needed).
- Hot key (superstar track) → pre-warm multi-CDN, forced replication, special short-manifest with fewer renditions to reduce edge misses.
18) Deployment & Infra
- Kubernetes across regions; multi-cluster per region; namespaces per domain.
- GitOps (ArgoCD), progressive delivery (canary/blue‑green), feature flags.
- DB: multi-DC Cassandra; Postgres with logical replication & read replicas; OpenSearch with cross-cluster replication.
- Secrets: KMS + sealed secrets; short-lived credentials (SPIFFE/SPIRE).
- Edge: WAF, rate limiting, bot management.
19) Data Governance & Rights
- Contracts: per-label/territory windows; embargo handling.
- Content ID: fingerprint match on UGC ingestion; takedown workflow.
- Audit: append-only ledger for license decisions and entitlement checks.
20) Interview Appendix (Cheat-Sheet)
- Lead with immutable objects + CDN; metadata multi-region; eventual consistency acceptable for catalog.
- Playback critical path must survive partial outages using caches and alternate origins.
- Ingestion emits events to warm caches and drive recsys freshness.
- DRM + offline: device-bound keys, short-lived licenses.
- Observability: QoE metrics are first-class.
21) Stretch Ideas
- Client-to-client local assist (opt-in) for multi-device households on same LAN to reduce egress.
- Edge compute: per-edge re-packaging, personalized manifests with privacy-preserving tokenization.
- Green streaming: adaptive bitrate by grid carbon intensity, when user permits.
22) Glossary (select)
- HLS/DASH: HTTP streaming protocols using segment playlists/manifests.
- DRM: Digital Rights Management; content decryption is licensed per device.
- ABR: Adaptive Bitrate; client changes rendition to network conditions.
- RPO/RTO: Recovery Point/Time Objective (data loss and restore time targets).
23) References (non-exhaustive, conceptual)
- RFCs for HTTP/2, QUIC/HTTP/3; EBU R128 loudness; OAuth 2.1 / PKCE docs.
- Public talks from large streamers (Netflix, Spotify, YouTube) on CDNs, ABR, and ML personalization (conceptually aligned).