← Back to Blog
engineeringwebsocketcollaborationarchitecture

How We Handle Real-Time Collaboration

A deep dive into the WebSocket architecture powering collaborative editing in Caustic Studio.

Shan·March 13, 2026·1 min read

The Architecture

Real-time collaboration in a game editor is fundamentally different from text editing. You're not just syncing characters — you're syncing pixel operations, audio waveforms, tile placements, and node graph connections.

Operational Transform vs CRDTs

We evaluated both approaches:

ApproachProsCons
OTWell-understood, good for textComplex for structured data
CRDTsEventually consistent, offline-friendlyMemory overhead, complex merge

We chose a hybrid approach: CRDTs for the data model, with a central server for ordering and conflict resolution.

Message Protocol

interface SyncMessage {
  type: 'operation' | 'snapshot' | 'presence';
  clock: number;
  userId: string;
  payload: Operation[];
}

Every operation is:

  • Idempotent — applying it twice has no effect
  • Commutative — order doesn't matter for the final state
  • Compact — we delta-encode pixel changes as run-length encoded diffs

Performance at Scale

With 8 concurrent editors on a 512x512 canvas:

  • Average latency: 12ms (same region)
  • Bandwidth: ~2KB/s per user (after compression)
  • Conflict rate: <0.1% with our cursor-aware partitioning

The key insight: most game editing is spatially partitioned. Two artists rarely paint the exact same pixel at the exact same time.