
Cloud Linux Inc.
13 days ago

CloudLinux is looking for a brilliant Senior Site Reliability Engineer (SRE) to join the Release Engineering Department, a team that plays a critical role in maintaining both external and internal infrastructure related to package repositories, with a strong focus on delivering and managing repository distribution to users.
This role offers a unique opportunity to collaborate with multiple development teams, accelerate progress, and provide enterprise-level solutions globally. Responsibilities include Linux OS administration, designing system solutions at an architectural level, advancing cloud technologies, system programming, Python/Linux scripting, and working with virtualization. This is a remote position best suited for professionals located in Europe and CIS, as the team primarily operates within European time zones.
As our Senior Site Reliability Engineer, you will:
- Design, implement, and manage scalable, resilient, and secure wide company repository infrastructure for CloudLinux products as a first assignment.
- Automate software operations for re-usability and consistency across private and public clouds, taking into consideration the complexities of distributed systems.
- Monitor system performance and troubleshoot issues proactively to ensure optimal uptime and reliability.
- Automate deployment processes using Infrastructure as Code (IaC) principles.
- Share your experience, know-how, and best practices with other team members in design sessions, system architecture discussions, mentorship, and doing work together.
To be successful, you should have:
- Strong background in development: an ideal candidate had started a career as a developer, then rolled to infrastructure-based projects on a large scale.
- Proven experience as a leading SRE or in a similar role, with a strong focus on Linux environments.
- Proficiency in modern agile SDLC practices and principles, orchestration, and CI/CD tooling i.e. Python, Java, Terraform, Ansible, Cloudformation, Puppet, Chef, or similar.
- Knowledge of the Grafana ecosystem or similar, building dashboards, alert rules, PromQL, as well as frontend observability.
- Excellent technical knowledge of IT Infrastructure, including network and application load balancers, switches, routers, and IP addressing.
- Strong analytical and problem-solving skills with a focus on root cause analysis and mitigation.
- Excellent communication and teamwork skills with the ability to collaborate effectively across engineering teams.
- English: at least Intermediate level required.