Enable job alerts via email!

Site Reliability Engineer

Natobotics Ltd

London

Hybrid

GBP 65,000 - 85,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading tech firm in London is seeking a Site Reliability Engineer (SRE) to enhance system reliability through automation and environment management. The candidate will focus on developing Infrastructure as Code and establishing service level objectives. Proficiency in monitoring tools like Prometheus and cloud infrastructure knowledge are essential. This hybrid contract role offers an opportunity to drive continuous improvement and instill a reliability culture across teams.

Qualifications

Mid-Senior level experience in site reliability engineering.
Proficiency in monitoring and logging tools.
Strong problem-solving and communication skills.

Responsibilities

Develop Infrastructure as Code (IaC) for test environments.
Define and measure service level objectives (SLOs).
Analyze environment performance data for improvements.
Lead incident management for test environment issues.

Skills

Proficiency with Prometheus

Proficiency with Splunk

Proficiency with Grafana

Strong scripting skills in Python

Strong scripting skills in Bash

Deep understanding of AWS

Solid Linux knowledge

Knowledge of Docker

Knowledge of Kubernetes

Tools

Terraform

Ansible

Jenkins

GitLab CI

Overview

A Site Reliability Engineer is responsible for transforming the SDLC environment with an engineering-focused role that emphasizes system reliability, automation, and performance in a non-production setting.

Role: Site Reliability Engineer (SRE)

Location: London

Work Mode: Hybrid

Contract Role

Responsibilities

Automate environment lifecycle: Develop Infrastructure as Code (IaC) to automate provisioning, teardown, and configuration of test environments, integrating them with the CI/CD pipeline.
Establish service level objectives (SLOs): Define and measure key service indicators (SLIs) for test environments to meet the needs of development and testing teams.
Monitor environment health and performance: Use observability tools (e.g., Prometheus, Grafana) to track health, identify bottlenecks, and resolve issues proactively.
Manage incident response: Lead incident management for test environment issues, conduct blameless post-mortems, and implement lasting fixes.
Minimize toil: Automate manual, repetitive tasks related to test environments to free up engineering time.
Drive continuous improvement: Analyze environment performance data, incident reports, and post-mortems to identify opportunities for improvement and innovation.
Balance reliability and speed: Use an error budget approach for test environments to guide reliability versus feature development.
Instil a reliability culture: Promote a blameless culture and shared ownership across development, QA, and SRE teams.
Capacity planning: Anticipate future resource needs and ensure infrastructure can scale to meet demand.
Advance test data management: Ensure test data is readily available, consistent, compliant, and provisioned with environments.

Technical Skills

Monitoring and logging tools: Proficiency with Prometheus, Splunk, Grafana; CI/CD platforms (e.g., Jenkins, GitLab CI); and configuration management tools (e.g., Ansible, Terraform).
Cloud infrastructure: Deep understanding of AWS, containerization (Docker, Kubernetes), and serverless computing.
Scripting and programming: Strong scripting skills in Python or Bash.
Systems and networking: Solid Linux, networking, and database management knowledge.

Soft Skills

Leadership and influence: Ability to champion SRE practices and influence stakeholders across teams.
Problem-solving: Strong analytical and debugging skills for complex issues under pressure.
Communication: Excellent collaboration skills across development, QA, and operations.
Adaptability: Proactive and adaptable mindset to evolving technology and methodologies.

Seniority level

Mid-Senior level

Employment type

Contract

Job function

Engineering and Information Technology

Industries

IT Services and IT Consulting

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top locations

Top companies

Top positions