Bank-grade plumbing pass (Wave 1). Six closely-coupled hardenings that bring the SDK's reliability surface up to Stripe / Segment / Mixpanel standards. Backwards-compatible: no public API removed, every new option has a sensible default, every behaviour change is additive. Source-compatible with 0.7.x — Crossdeck.init({...}) callsites do not need to change.
Added
- Durable event queue. Queued events are now written through to the SDK's identity store (typically
localStorage) so a hard browser crash, power loss, or terminal-flush keepalive: true cap exceedance (64 KB) doesn't lose data. On the next SDK boot the persisted queue is rehydrated and replayed. Backend dedupes by eventId so a replayed event already on the wire when the tab crashed is safe — ReplacingMergeTree handles it. New module event-storage.ts (PersistentEventStore). Skipped when persistIdentity: false (strict-consent flows).
- Exponential backoff with full jitter on flush failures. Replaces the prior "retry on the next idle window" policy which hot-looped a flapping endpoint. Defaults:
baseMs=1000, factor=2, maxMs=60000. Each failure schedules the next flush at min(maxMs, baseMs 2^attempts) Math.random() ms out. Reset on success. Surface via diagnostics().events.consecutiveFailures + nextRetryAt. New module retry-policy.ts (RetryPolicy, computeNextDelay).
Retry-After header support on 429 / 503. The HTTP layer now parses the header (delta-seconds or HTTP-date per RFC 7231 §7.1.3) onto CrossdeckError.retryAfterMs, and the retry policy honours it when it's longer than the computed backoff. Stripe pattern — the server is the authority on its own pressure.
Idempotency-Key header per batch. Every /v1/events POST now carries Idempotency-Key: batch_<rand>. Retries of the SAME logical batch reuse the SAME key so a future server-side idempotency layer can short-circuit duplicate work without inspecting bodies. Per-event eventId dedup remains in place — this is belt-and-suspenders.
- Request timeout via
AbortController. New timeoutMs option on CrossdeckOptions and per-request options.timeoutMs on HttpClient.request(). Default 15 000 ms. Without this, a captive portal / DNS hang / satellite link could leave a request open for the browser's default (5+ minutes on Chrome) and lock the queue forever. Pass timeoutMs: 0 to disable (useful for tests). New error: CrossdeckError({ type: "network_error", code: "request_timeout" }).
- Property validation at enqueue.
track(name, properties) now sanitises properties BEFORE the event lands in the queue. New module event-validation.ts. Behaviour:
- Drops functions, symbols, undefined values (with a debug warning).
- Coerces Date → ISO string, BigInt → string, Error → { name, message, stack }, Map → plain object, Set → array.
- Truncates string values longer than maxStringLength (default 1024) with an ellipsis.
- Replaces circular refs with "[circular]" and depth > 5 nesting with "[depth-exceeded]".
- Caps total per-event property byte size at maxBatchPropertyBytes (default 8 KB); past the cap, largest properties drop first and a __truncated: true marker is added.
- Caller's input is never mutated — sanitisation always produces a defensive copy.
- Output is guaranteed JSON.stringify-safe. One bad property can no longer poison the entire batch indefinitely.
- Listener-error counter on
EntitlementCache. Listener exceptions are still swallowed (a buggy consumer must not crash the SDK) but the cumulative count is now surfaced as diagnostics().entitlements.listenerErrors so a broken subscriber can be spotted without a debug session.
- Clock-skew diagnostics.
Crossdeck.heartbeat() now captures the server's serverTime and the local Date.now() at the same moment. Surfaces via diagnostics().clock.{lastServerTime, lastClientTime, skewMs} so a wrong-system-clock problem (kid changed the date, dev machine bad NTP) surfaces in dashboards before it corrupts a day of analytics.
- New debug signals:
sdk.property_coerced, sdk.queue_persisted, sdk.queue_restored, sdk.flush_retry_scheduled. Fire in debug mode only — quiet by default.
- 65 new tests (203 total, up from 138):
- tests/event-validation.test.ts — 19 cases covering every coercion / drop / truncation / depth / size-cap path + JSON-roundtrip + no-mutation guarantee.
- tests/event-storage.test.ts — 8 cases covering load / save round-trip, debouncing, malformed-blob recovery, version sentinel, throwing-storage degradation.
- tests/retry-policy.test.ts — 12 cases covering backoff math, jitter, Retry-After precedence, attempt overflow safety, counter reset.
- tests/event-queue.test.ts — 9 new cases covering Idempotency-Key uniqueness, retry scheduling, server Retry-After honouring, durable rehydration, write-through, persistent clear on success, reset() wipe.
- tests/http.test.ts — 5 new cases covering Idempotency-Key passthrough, abort-timeout behaviour, per-call timeout override, 0-disables-timeout, Retry-After parse onto retryAfterMs.
- tests/errors.test.ts — 9 new cases covering parseRetryAfterHeader for delta-seconds, HTTP-date, past dates, malformed input.
- tests/entitlement-cache.test.ts — 1 new case covering the listener-error counter.
- tests/crossdeck.test.ts — 1 new case asserting the full Wave-1 diagnostic surface.
Changed
CrossdeckError now carries an optional retryAfterMs field, populated from the response's Retry-After header on 4xx/5xx.
Diagnostics shape extended with:
- clock: { lastServerTime, lastClientTime, skewMs }
- entitlements.listenerErrors: number
- events.consecutiveFailures: number, events.nextRetryAt: number | null
- Existing
Diagnostics fields and their semantics are unchanged.
Migration
No callsite changes required. New options (timeoutMs, retry tuning) default to sensible bank-grade values. To opt out of property validation, pass already-clean property objects — there's no escape hatch, and there shouldn't be: an SDK that lets one bad event poison the whole batch isn't bank-grade.