
Sleeper
about 1 month ago
Remote-first | Core Platform & Reliability
Why this role exists
Our Core Backend group provides the low-latency, high-scale services that every product team builds on. The Platform Team is the bridge: we keep those core systems healthy and extend their capabilities so that internal business units can ship features rapidly and safely. Your mission is to make uptime, reliability, and team velocity the natural by-products of great engineering—not late-night heroics.
Stack-Ranked Responsibilities (1 = Most Important)
Design & own platform services/APIs that expose core backend functionality to product teams and unlock new business flows through Retool.
Own platform reliability / accessibility and support core engineering on existing Elixir / Cassandra systems — be on rotation and first to respond to production issues. Partner with the core engineering team to build longer term resilient systems.
Support and enhance CI/CD pipelines (Buildkite, IaC) so the Core team — and every business unit — can ship safely and quickly.
Ensure end-to-end observability by partnering with core backend and applications on metrics, traces, and alerts, and adding instrumentation where gaps exist.
Automate site-reliability workflows (issue triage, cluster upgrades, schema migrations) while collaborating with each team on their specific operational processes.
Stack-Ranked Required Skills (3 technical | 2 cross-functional)
Distributed-systems engineering in Elixir/Erlang (or Go/Python) with a focus on reliability patterns (idempotency, graceful degradation).
Cloud-native infrastructure & automation — Kubernetes on GCP, Buildkite CI/CD, Terraform or similar IaC, and scripting to eliminate manual toil.
Observability & SRE tooling — designing metrics, logs, and traces that drive proactive detection and rapid remediation.
Cross-team collaboration & communication — able to partner with Core Backend and multiple business units, translating reliability needs into actionable engineering work.
Developer-experience mindset — empathetic API/SDK design and clear documentation that accelerates other engineers’ adoption of platform capabilities.
Benefits
Competitive salary and stock options
Comprehensive health, dental, and vision insurance
401(k)
Flexible working hours and remote-first culture
Clear paths for career growth and leadership
How we work
Remote-first, async-heavy. Deep work valued; meetings kept minimal.
Light follow-the-sun escalation — because automation, testing, and observability catch issues early.
Blameless culture. We learn fast and systematize fixes.
Ready to build the platform that powers everything — and make reliability boring? Apply now and help us keep the core humming while unlocking new possibilities for every product team.