Job Search and Career Advice Platform

Enable job alerts via email!

Site Reliability Engineer

Natobotics Ltd

London

Hybrid

GBP 65,000 - 85,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading tech firm in London is seeking a Site Reliability Engineer (SRE) to enhance system reliability through automation and environment management. The candidate will focus on developing Infrastructure as Code and establishing service level objectives. Proficiency in monitoring tools like Prometheus and cloud infrastructure knowledge are essential. This hybrid contract role offers an opportunity to drive continuous improvement and instill a reliability culture across teams.

Qualifications

  • Mid-Senior level experience in site reliability engineering.
  • Proficiency in monitoring and logging tools.
  • Strong problem-solving and communication skills.

Responsibilities

  • Develop Infrastructure as Code (IaC) for test environments.
  • Define and measure service level objectives (SLOs).
  • Analyze environment performance data for improvements.
  • Lead incident management for test environment issues.

Skills

Proficiency with Prometheus
Proficiency with Splunk
Proficiency with Grafana
Strong scripting skills in Python
Strong scripting skills in Bash
Deep understanding of AWS
Solid Linux knowledge
Knowledge of Docker
Knowledge of Kubernetes

Tools

Terraform
Ansible
Jenkins
GitLab CI
Job description
Overview

A Site Reliability Engineer is responsible for transforming the SDLC environment with an engineering-focused role that emphasizes system reliability, automation, and performance in a non-production setting.

Role: Site Reliability Engineer (SRE)

Location: London

Work Mode: Hybrid

Contract Role

Responsibilities
  • Automate environment lifecycle: Develop Infrastructure as Code (IaC) to automate provisioning, teardown, and configuration of test environments, integrating them with the CI/CD pipeline.
  • Establish service level objectives (SLOs): Define and measure key service indicators (SLIs) for test environments to meet the needs of development and testing teams.
  • Monitor environment health and performance: Use observability tools (e.g., Prometheus, Grafana) to track health, identify bottlenecks, and resolve issues proactively.
  • Manage incident response: Lead incident management for test environment issues, conduct blameless post-mortems, and implement lasting fixes.
  • Minimize toil: Automate manual, repetitive tasks related to test environments to free up engineering time.
  • Drive continuous improvement: Analyze environment performance data, incident reports, and post-mortems to identify opportunities for improvement and innovation.
  • Balance reliability and speed: Use an error budget approach for test environments to guide reliability versus feature development.
  • Instil a reliability culture: Promote a blameless culture and shared ownership across development, QA, and SRE teams.
  • Capacity planning: Anticipate future resource needs and ensure infrastructure can scale to meet demand.
  • Advance test data management: Ensure test data is readily available, consistent, compliant, and provisioned with environments.
Technical Skills
  • Monitoring and logging tools: Proficiency with Prometheus, Splunk, Grafana; CI/CD platforms (e.g., Jenkins, GitLab CI); and configuration management tools (e.g., Ansible, Terraform).
  • Cloud infrastructure: Deep understanding of AWS, containerization (Docker, Kubernetes), and serverless computing.
  • Scripting and programming: Strong scripting skills in Python or Bash.
  • Systems and networking: Solid Linux, networking, and database management knowledge.
Soft Skills
  • Leadership and influence: Ability to champion SRE practices and influence stakeholders across teams.
  • Problem-solving: Strong analytical and debugging skills for complex issues under pressure.
  • Communication: Excellent collaboration skills across development, QA, and operations.
  • Adaptability: Proactive and adaptable mindset to evolving technology and methodologies.
Seniority level
  • Mid-Senior level
Employment type
  • Contract
Job function
  • Engineering and Information Technology
Industries
  • IT Services and IT Consulting
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.