engineeringwebsocketcollaborationarchitecture
How We Handle Real-Time Collaboration
A deep dive into the WebSocket architecture powering collaborative editing in Caustic Studio.
Shan·March 13, 2026·1 min read
The Architecture
Real-time collaboration in a game editor is fundamentally different from text editing. You're not just syncing characters — you're syncing pixel operations, audio waveforms, tile placements, and node graph connections.
Operational Transform vs CRDTs
We evaluated both approaches:
| Approach | Pros | Cons |
|---|---|---|
| OT | Well-understood, good for text | Complex for structured data |
| CRDTs | Eventually consistent, offline-friendly | Memory overhead, complex merge |
We chose a hybrid approach: CRDTs for the data model, with a central server for ordering and conflict resolution.
Message Protocol
interface SyncMessage {
type: 'operation' | 'snapshot' | 'presence';
clock: number;
userId: string;
payload: Operation[];
}
Every operation is:
- Idempotent — applying it twice has no effect
- Commutative — order doesn't matter for the final state
- Compact — we delta-encode pixel changes as run-length encoded diffs
Performance at Scale
With 8 concurrent editors on a 512x512 canvas:
- Average latency: 12ms (same region)
- Bandwidth: ~2KB/s per user (after compression)
- Conflict rate: <0.1% with our cursor-aware partitioning
The key insight: most game editing is spatially partitioned. Two artists rarely paint the exact same pixel at the exact same time.