1 day ago

Logo of TRACTIAN

Senior DevOps Engineer

TRACTIAN

Sao Paulo, BrazilRemote
Engineering at TRACTIAN

The Engineering team at TRACTIAN builds and operates the cloud-native backbone that powers our industrial IoT platform. We design for massive scale, high reliability, and security across AWS, Azure AKS, and Oracle Cloud (OCI) Kubernetes clusters.

What youll do

- Own end-to-end delivery pipelines—from GitHub commit to production—running on GitHub Actions, ECS Fargate, AKS, and OCI Kubernetes.
- Evolve our multi-cloud, multi-cluster architecture (AWS + OCI) with zero-trust networking.
- Write and maintain IaC (Terraform + Terragrunt), Helm charts, and Kubernetes operators to automate everything.
- Optimize observability: build dashboards/alerts using Grafana OSS stack, Prometheus, Loki, Tempo, and Datadog.
- Troubleshoot complex incidents involving microservices, monoliths in containers, and AI workloads on GPU nodes.
- Improve security posture—harden images, manage secrets, enforce policies, and audit compliance.
- Help other engineers on DevOps best practices and drive continuous improvement.

Responsibilities

  • Apply DevOps practices to increase deployment speed, security, and quality.
  • Architect and run CI/CD workflows in GitHub Actions (matrix builds, reusable workflows, OIDC federation).
  • Design, build, and maintain Terraform/Terragrunt modules for VPCs, subnets, security groups, side-to-side VPNs, and private links.
  • Manage container orchestration on ECS Fargate and Kubernetes (AWS & OCI) with Helm, Keda.
  • Implement autoscaling, blue-green / canary releases, and cost-optimization for GPU and CPU workloads.
  • Diagnose performance bottlenecks across network, compute, storage, and application layers.
  • Maintain high-quality documentation.

Requirements

  • B.S. in Computer Engineering, Information Systems, or equivalent experience.
  • Strong scripting skills (Python, Bash); Go or Rust a plus.
  • Hands-on CI/CD with GitHub Actions and experience running production workloads on:
  • AWS: ECS Fargate, S3, RDS, CloudWatch, VPC networking.
  • Kubernetes: OCI OKE, Helm, Istio, Keda.
  • IaC expertise with Terraform and Terragrunt in multi-account/multi-cloud setups.
  • Solid networking foundations: VPC design, subnets, routing, VPN/IPSec tunnels, security groups, load balancers.
  • Observability stack experience (Grafana, Prometheus, Loki, Tempo, Datadog).
  • Familiarity with container security, SBOMs, image scanning, secret management, and least-privilege IAM.
  • Excellent problem-solving skills, ownership mindset, and ability to work autonomously within a distributed team.