Job Search and Career Advice Platform

Enable job alerts via email!

Operations Manager (Service Management & Site Reliability)

Custodia Technology Ltd

Knutsford

On-site

GBP 80,000 - GBP 100,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A UK-based technology firm based in Knutsford is seeking an experienced Operations Manager to oversee a critical platform's service performance and reliability. This role includes managing customer service functions, leading a team, and ensuring ITIL-aligned operations. The ideal candidate will have a strong background in service management, familiarity with Site Reliability Engineering (SRE), and technical expertise in cloud platforms. This on-site position offers opportunities for career advancement in a dynamic work environment.

Benefits

Private Health and Dental Care (BUPA)
Free On-Site Gym
Free access to Udemy
Employee Assistance Programme
Free parking
Opportunities for professional growth and advancement

Qualifications

  • Demonstrable experience owning or operating services with critical uptime and compliance.
  • Proven experience as an Operations Manager or equivalent.
  • Strong hands-on experience with ITIL service management practices.

Responsibilities

  • Act as Service Manager with accountability for service performance and stability.
  • Lead the Customer Service (Level 1) team for high-quality support.
  • Own end-to-end reliability and incident management for the platform.

Skills

ITIL service management practices
Site Reliability Engineering (SRE)
Customer Service management
Technical skills in Azure, Windows Server, Linux
Scripting (PowerShell, Bash, Python)
Job description
Operations Manager (Service Management & Site Reliability)

Custodia is a UK based company, founded in 2017, with wider presence in North America, Europe and Asia both directly and through strategic partnerships.

Our current key offering is the CC1 (Compliance Cloud One) service which records, stores and normalizes any type of communications data. This includes many common platforms such as phone email, SMS, phone calls, Microsoft Teams, WhatsApp and WeChat amongst many others. This allows companies to communicate in a compliant manner, whilst driving greater data‑driven insights from the data they already have to store.

We are seeking an experienced Operations Manager to act as the operational owner and Service Manager for a business‑critical platform, while also managing the Customer Service (Level 1) function.

This role combines ITIL‑based service management discipline, Site Reliability Engineering (SRE) principles, and people leadership to ensure high service availability, effective incident response, and continuous improvement across both customer‑facing support and backend service operations.

You will have end‑to‑end accountability for live service operations, leading both the Service Engineering team and the Customer Service (L1) team, and owning service performance, platform reliability, operational risk, and financial stewardship.

Hands‑on experience managing cloud platforms in critical, always‑on environments is a mandatory requirement for this role.

Key Responsibilities

Service & Operational Ownership

  • Act as the named Service Manager for the platform, with full accountability for service performance, stability, and customer impact.
  • Own the service lifecycle, from operational readiness and go‑live through live service management and continual improvement.
  • Define, own, and report against SLAs, SLOs, and operational KPIs across both customer service and service engineering functions.
  • Serve as the primary operational escalation point for internal stakeholders and key customers.

Customer Service (L1) Management

  • Lead and manage the Customer Service (Level 1) team, ensuring consistent, high‑quality first‑line support for customers.
  • Ensure effective triage, prioritisation, and escalation of incidents from L1 to Service Engineering.
  • Drive customer‑focused service metrics, including response times, resolution quality, and customer satisfaction.
  • Establish training, coaching, and quality assurance processes to continually improve L1 service delivery.

Reliability, Availability & Incident Management

  • Own the end‑to‑end reliability and availability of a mission‑critical, compliance‑focused platform.
  • Apply SRE principles to reduce incidents, manage operational risk, and balance reliability with delivery velocity.
  • Lead major incident management, ensuring effective coordination, clear communication, and rapid service restoration.

ITIL‑Aligned Service Operations

  • Lead Incident, Problem, Change, and Release Management in line with ITIL best practices.
  • Plan and execute on‑premises software upgrades and platform changes, ensuring controlled delivery and minimal disruption.
  • Drive thorough root cause analysis (RCA) and ensure corrective actions are implemented and tracked to completion.
  • Maintain audit‑ready service documentation, runbooks, and operational procedures.
  • Own operational oversight of cloud and hybrid platforms supporting critical customer services.
  • Work closely with Engineering, Product, and Security teams to ensure platforms are operationally ready, resilient, observable, and secure.
  • Ensure appropriate monitoring, alerting, capacity planning, and resilience controls are in place across Azure and hybrid environments.
  • Champion automation and Infrastructure‑as‑Code to reduce operational toil and improve reliability.

Budget & Cost Management

  • Own and manage the operations and service engineering budget, ensuring spend is forecast, controlled, and aligned to service outcomes.
  • Manage costs related to cloud infrastructure, on‑premises upgrades, tooling, licensing, and third‑party services.
  • Partner with Finance and Procurement to justify investment and identify cost‑optimisation opportunities without compromising service reliability or compliance.

Leadership & Team Development

  • Lead, coach, and develop both the Customer Service (L1) and Service Engineering teams.
  • Establish structured onboarding, training, and progression paths to build resilient, high‑performing teams.
  • Foster a culture of accountability, service excellence, and continuous improvement across operations.
What You Bring

Essential (Must Have)

  • Demonstrable experience owning or operating services where uptime, data integrity, and regulatory compliance are critical.

Required Experience & Skills

  • Proven experience as an Operations Manager, Service Manager, or equivalent, with ownership of live services.
  • Strong hands‑on experience with ITIL service management practices, particularly Incident, Problem, Change, and Continual Improvement.
  • Experience managing Customer Service / L1 support teams in a production environment.
  • Working knowledge of Site Reliability Engineering (SRE) principles and operational risk management.
  • Strong technical foundation across Azure, Windows Server, Linux (RedHat), Active Directory, networking, and scripting (PowerShell, Bash, or Python).
  • Experience delivering platform upgrades and managing production change in cloud and hybrid environments.
  • Experience owning operational budgets and cost centres.
  • Calm, structured leadership style with a strong focus on uptime, customer impact, deadlines, and service quality.
  • A genuine commitment to training, mentoring, and building high‑performing operational teams.
What We Offer
  • Private Health and Dental Care (BUPA)
  • Free On‑Site Gym
  • Free access to Udemy
  • Employee Assistance Programme
  • Free parking
  • Opportunities for professional growth and advancement
  • Dynamic and innovative work environment
  • Opportunity to make your mark in a high‑growth industry
  • A beautiful office in the historic Cheshire town of Knutsford, with easily accessible public transport links to Manchester and Chester

This role is on‑site, based at our office in Knutsford.

Reports Into: Head of Engineering / Platform Services

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Principal Platform Engineer

automata.tech

Greater London
Hybrid
GBP 130,000 - 170,000
Full time
30+ days ago
Cloud Platform Specialist

Mott MacDonald

Newcastle upon Tyne
On-site
GBP 50,000 - 70,000
Full time
30+ days ago
Cloud Service Desk Manager

Iomart Group

United Kingdom
On-site
GBP 55,000 - 75,000
Full time
30+ days ago
Senior Service Manager

Computacenter Holding GmbH

East Midlands
Hybrid
GBP 70,000 - 90,000
Full time
30+ days ago
Senior DevOps & SRE Lead – CI/CD, Azure, Cloud

OCU Group

Preston
On-site
GBP 80,000 - 100,000
Full time
30+ days ago
Regional Operations Manager

Shou

Manchester
On-site
GBP 35,000 - 50,000
Full time
30+ days ago
Platform Engineer

Methodfi

Greater London
Hybrid
GBP 80,000 - 95,000
Full time
30+ days ago
DevOps Lead

OCU Group

Preston
On-site
GBP 80,000 - 100,000
Full time
30+ days ago
Senior IT Infrastructure & Support Engineer

Chess Dynamics

Horsham
Hybrid
GBP 45,000 - 60,000
Full time
30+ days ago
Cloud Operations Manager

MHR

Ruddington
On-site
GBP 70,000 - 90,000
Full time
30+ days ago