Job Search and Career Advice Platform

Enable job alerts via email!

Principal Site Reliability Engineer

Dubizzle Limited

Greater London

Hybrid

GBP 70,000 - GBP 90,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading tech company is seeking a Site Reliability Engineer to enhance operational excellence and reliability across its systems. The role involves defining service levels, implementing cloud infrastructure, and mentoring engineers on best practices. Ideal candidates will possess deep expertise in Kubernetes, AWS, and Infrastructure as Code with a commitment to scaling and automation. A hybrid work model is offered with competitive benefits including a subsidised gym membership and private medical insurance.

Benefits

Subsidised Gym Membership
Private Medical Insurance
25 days holiday
Annual Discretionary Bonus
Cycle to Work Scheme

Qualifications

  • Demonstrable experience leading SRE transformations.
  • Deep hands-on expertise with Kubernetes in production environments.
  • Strong experience with AWS core services.

Responsibilities

  • Define and enforce SLOs, SLIs, and error budgets.
  • Craft and implement a cloud infrastructure and tooling strategy.
  • Mentor engineers on reliability and operational readiness.

Skills

SRE transformations
Kubernetes
AWS core services
Infrastructure as Code
Observability
Automation
Incident management

Tools

Terraform
CloudFormation
Job description
Overview

Orgvue is a leading organizational design and planning software platform that captures the power of data visualization and modelling to build more adaptable, and better performing organizations. HR, finance and business leaders use Orgvue for actionable insight and analysis that helps them make faster workforce decisions in a constantly changing world.

Orgvue is used by the world’s largest and best-known enterprises and management consulting firms to visualize and confidently build the businesses they want tomorrow, today. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney.

Role

In this role you will work across product, platform, and operations teams to ensure our systems are reliable, observable, and resilient, even at scale.

This role combines hands-on technical capability with strategic vision, helping us build a world-class reliability culture and a robust engineering foundation for growth. We're looking for someone who has technical expertise, is a great communicator and enjoys collaborating across multiple teams.

Responsibilities
  • Define and enforce SLOs, SLIs, and error budgets across critical services
  • Crafting and implementing a cloud infrastructure and tooling strategy
  • Work across our Org to level up SRE practices
  • Help implement robust observability metrics, logs & traces using our observability tool
  • Guide the team in building automated, self-healing systems
  • Own and evolve our incident response processes, including on-call practices and post-mortem culture
  • Mentor engineers across the org on best practices in reliability, operational readiness, and scalable infrastructure
  • Drive Infrastructure as Code (IaC) using Terraform, Kubernetes, CloudFormation and GitOps practices
  • Collaborate closely with security, DevOps, and software teams to ensure compliance, scalability, and operational excellence
  • Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform
  • Demonstrable experience leading SRE transformations
  • Deep hands-on expertise with Kubernetes (EKS preferred) in production environments
  • Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)
  • Expert in Infrastructure as Code using tools such as Terraform, with knowledge of GitOps workflows
  • Strong background in observability: metrics, visualization, logging, and tracing
  • Understanding of automation, SDLC, CI/CD pipelines, deployment automation, and blue/green or canary releases
  • Proven experience with incident management, disaster recovery planning, root cause analysis, and post-incident reviews
  • Hybrid working - 1+ days a week in the London office
  • Wellbeing: Sanctus Coaching, Virtual fitness sessions, Wellbeing webinars, Annual Wellbeing day
  • Subsidised Gym Membership
  • Private Medical Insurance (including Dental and Vision) and Life Assurance
  • 25 days holiday (increasing to 30 days at a rate of 1 extra day per year)
  • Summer Fridays (half-day Fridays for the months of July and August)
  • Employer pension contribution of 5% of your gross salary, if you contribute a minimum of 3%
  • Season ticket Loan
  • Cycle to Work Scheme
  • Annual Discretionary Bonus

Here at Orgvue we promote individualism and a diverse workforce to build on our future success

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer

Wedo Technology Solutions Ltd.

Greater London
On-site
GBP 100,000 - 125,000
Full time
30+ days ago
Platform Engineer

Methodfi

Greater London
Hybrid
GBP 80,000 - 95,000
Full time
30+ days ago
Senior Site Reliability Engineer

Methodfi

United Kingdom
Remote
GBP 70,000 - 90,000
Full time
30+ days ago
Account Executive

Dubizzle Limited

Greater London
Hybrid
GBP 60,000 - 80,000
Full time
30+ days ago
Platform Engineer

Attio Ltd

Greater London
On-site
GBP 80,000 - 95,000
Full time
30+ days ago
Lead Site Reliability Engineer

Methodfi

Greater London
Hybrid
GBP 70,000 - 90,000
Full time
30+ days ago
Product Marketing Manager

Dubizzle Limited

Greater London
Hybrid
GBP 40,000 - 60,000
Full time
30+ days ago
Site Reliability Engineer

bet365 Group

Stoke-on-Trent
Hybrid
GBP 50,000 - 70,000
Full time
30+ days ago
Business Development Executive EMEA

Dubizzle Limited

Greater London
Hybrid
GBP 30,000 - 40,000
Full time
30+ days ago
Site Reliability Engineer

Methodfi

United Kingdom
Hybrid
GBP 70,000 - 90,000
Full time
30+ days ago