CloudLinux is looking for a brilliant Senior Site Reliability Engineer (SRE) to join the Release Engineering Department, a team that plays a critical role in maintaining both external and internal infrastructure related to package repositories, with a strong focus on delivering and managing repository distribution to users.

This role offers a unique opportunity to collaborate with multiple development teams, accelerate progress, and provide enterprise-level solutions globally. Responsibilities include Linux OS administration, designing system solutions at an architectural level, advancing cloud technologies, system programming, Python/Linux scripting, and working with virtualization. This is a remote position best suited for professionals located in Europe and CIS, as the team primarily operates within European time zones.

As our Senior Site Reliability Engineer, you will:

Design, implement, and manage scalable, resilient, and secure wide company repository infrastructure for CloudLinux products as a first assignment.
Automate software operations for re-usability and consistency across private and public clouds, taking into consideration the complexities of distributed systems.
Monitor system performance and troubleshoot issues proactively to ensure optimal uptime and reliability.
Automate deployment processes using Infrastructure as Code (IaC) principles.
Share your experience, know-how, and best practices with other team members in design sessions, system architecture discussions, mentorship, and doing work together.

To be successful, you should have:

Strong background in development: an ideal candidate had started a career as a developer, then rolled to infrastructure-based projects on a large scale.
Proven experience as a leading SRE or in a similar role, with a strong focus on Linux environments.
Proficiency in modern agile SDLC practices and principles, orchestration, and CI/CD tooling i.e. Python, Java, Terraform, Ansible, Cloudformation, Puppet, Chef, or similar.
Knowledge of the Grafana ecosystem or similar, building dashboards, alert rules, PromQL, as well as frontend observability.
Excellent technical knowledge of IT Infrastructure, including network and application load balancers, switches, routers, and IP addressing.
Strong analytical and problem-solving skills with a focus on root cause analysis and mitigation.
Excellent communication and teamwork skills with the ability to collaborate effectively across engineering teams.
English: at least Intermediate level required.

Cloud Linux Inc.

Senior Site Reliability Engineer (worldwide remote, work anywhere)

Cloud Linux Inc.