13 days ago

Logo of Cloud Linux Inc.

Senior Site Reliability Engineer (worldwide remote, work anywhere)

Cloud Linux Inc.

CyprusRemote

CloudLinux is looking for a brilliant Senior Site Reliability Engineer (SRE) to join the Release Engineering Department, a team that plays a critical role in maintaining both external and internal infrastructure related to package repositories, with a strong focus on delivering and managing repository distribution to users.

This role offers a unique opportunity to collaborate with multiple development teams, accelerate progress, and provide enterprise-level solutions globally. Responsibilities include Linux OS administration, designing system solutions at an architectural level, advancing cloud technologies, system programming, Python/Linux scripting, and working with virtualization. This is a remote position best suited for professionals located in Europe and CIS, as the team primarily operates within European time zones.

As our Senior Site Reliability Engineer, you will:

  • Design, implement, and manage scalable, resilient, and secure wide company repository infrastructure for CloudLinux products as a first assignment.
  • Automate software operations for re-usability and consistency across private and public clouds, taking into consideration the complexities of distributed systems.
  • Monitor system performance and troubleshoot issues proactively to ensure optimal uptime and reliability.
  • Automate deployment processes using Infrastructure as Code (IaC) principles.
  • Share your experience, know-how, and best practices with other team members in design sessions, system architecture discussions, mentorship, and doing work together.

To be successful, you should have:

  • Strong background in development: an ideal candidate had started a career as a developer, then rolled to infrastructure-based projects on a large scale. 
  • Proven experience as a leading SRE or in a similar role, with a strong focus on Linux environments.
  • Proficiency in modern agile SDLC practices and principles, orchestration, and CI/CD tooling i.e. Python, Java, Terraform, Ansible, Cloudformation, Puppet, Chef, or similar.
  • Knowledge of the Grafana ecosystem or similar, building dashboards, alert rules, PromQL, as well as frontend observability.
  • Excellent technical knowledge of IT Infrastructure, including network and application load balancers, switches, routers, and IP addressing.
  • Strong analytical and problem-solving skills with a focus on root cause analysis and mitigation.
  • Excellent communication and teamwork skills with the ability to collaborate effectively across engineering teams.
  • English: at least Intermediate level required.