Site Reliability Engineer

Responsibilities

Benefits

Requirements

Nice-to-haves

Are you passionate about driving innovation in software operations? Flanksource is on the hunt for a skilled Kubernetes Site Reliability Engineer who thrives on solving complex problems, optimizing systems for reliability and efficiency, and is eager to dive into the world of open-source, developer-first technologies.

We're looking for someone with a deep understanding of the Kubernetes ecosystem, a knack for automating and improving workflows, and a commitment to service excellence.

‍

Responsibilities:

Design, deploy, and maintain Kubernetes clusters across multiple environments
Develop and maintain automation tools for deploying and managing Kubernetes clusters
Monitor and troubleshoot Kubernetes clusters to ensure high availability and performance
Implement security best practices for Kubernetes infrastructure and services
Participate in incident response and work to reduce the MTTR over time.
Continuously improve the reliability, scalability, and performance of our Kubernetes infrastructure
Manage CI/CD pipelines and other DevOps tools and processes
Work closely with developers to ensure that our applications are deployed using best practises.

Requirements:

Deep understanding of Kubernetes and containers. (i.e. be a Certified Kubernetes Administrator (CKA)
Experience with 2 or more infrastructure as code (IaC) tools such as Terraform, Crossplane, Pulumi or CloudFormation
Experience with monitoring and observability tools such as Prometheus, Grafana, ELK, Datadog, Dynatrace, etc..
Experience with cloud platforms such as AWS, Azure, or GCP
Experience with CI/CD pipelines and tools such as Github Actions, Gitlab and Azure Devops.
Solid understanding of network and security principles, including VPNs, firewalls, and load balancers.
Excellent problem-solving skills and ability to work independently or as part of a team
Proficiency in the English language, both written and verbal, sufficient for success in a remote and largely asynchronous work environment
Comfort working in a highly agile, iterative software development process
Self-motivated and self-managing, with strong organizational skills.

Preferred:

Experience with GitOps principles and tooling such as Flux and ArgoCD
Experience writing Go, or a desire to learn.

Bonus Points for:

Contributions and/or a passion for open source.
Kubernetes operator and controller development.

‍

Apply

Remote

Site Reliability Engineer