Operations Manager (Service Management & Site Reliability)
Custodia is a UK based company, founded in 2017, with wider presence in North America, Europe and Asia both directly and through strategic partnerships.
Our current key offering is the CC1 (Compliance Cloud One) service which records, stores and normalizes any type of communications data. This includes many common platforms such as phone email, SMS, phone calls, Microsoft Teams, WhatsApp and WeChat amongst many others. This allows companies to communicate in a compliant manner, whilst driving greater data‑driven insights from the data they already have to store.
We are seeking an experienced Operations Manager to act as the operational owner and Service Manager for a business‑critical platform, while also managing the Customer Service (Level 1) function.
This role combines ITIL‑based service management discipline, Site Reliability Engineering (SRE) principles, and people leadership to ensure high service availability, effective incident response, and continuous improvement across both customer‑facing support and backend service operations.
You will have end‑to‑end accountability for live service operations, leading both the Service Engineering team and the Customer Service (L1) team, and owning service performance, platform reliability, operational risk, and financial stewardship.
Hands‑on experience managing cloud platforms in critical, always‑on environments is a mandatory requirement for this role.
Key Responsibilities
Service & Operational Ownership
- Act as the named Service Manager for the platform, with full accountability for service performance, stability, and customer impact.
- Own the service lifecycle, from operational readiness and go‑live through live service management and continual improvement.
- Define, own, and report against SLAs, SLOs, and operational KPIs across both customer service and service engineering functions.
- Serve as the primary operational escalation point for internal stakeholders and key customers.
Customer Service (L1) Management
- Lead and manage the Customer Service (Level 1) team, ensuring consistent, high‑quality first‑line support for customers.
- Ensure effective triage, prioritisation, and escalation of incidents from L1 to Service Engineering.
- Drive customer‑focused service metrics, including response times, resolution quality, and customer satisfaction.
- Establish training, coaching, and quality assurance processes to continually improve L1 service delivery.
Reliability, Availability & Incident Management
- Own the end‑to‑end reliability and availability of a mission‑critical, compliance‑focused platform.
- Apply SRE principles to reduce incidents, manage operational risk, and balance reliability with delivery velocity.
- Lead major incident management, ensuring effective coordination, clear communication, and rapid service restoration.
ITIL‑Aligned Service Operations
- Lead Incident, Problem, Change, and Release Management in line with ITIL best practices.
- Plan and execute on‑premises software upgrades and platform changes, ensuring controlled delivery and minimal disruption.
- Drive thorough root cause analysis (RCA) and ensure corrective actions are implemented and tracked to completion.
- Maintain audit‑ready service documentation, runbooks, and operational procedures.
- Own operational oversight of cloud and hybrid platforms supporting critical customer services.
- Work closely with Engineering, Product, and Security teams to ensure platforms are operationally ready, resilient, observable, and secure.
- Ensure appropriate monitoring, alerting, capacity planning, and resilience controls are in place across Azure and hybrid environments.
- Champion automation and Infrastructure‑as‑Code to reduce operational toil and improve reliability.
Budget & Cost Management
- Own and manage the operations and service engineering budget, ensuring spend is forecast, controlled, and aligned to service outcomes.
- Manage costs related to cloud infrastructure, on‑premises upgrades, tooling, licensing, and third‑party services.
- Partner with Finance and Procurement to justify investment and identify cost‑optimisation opportunities without compromising service reliability or compliance.
Leadership & Team Development
- Lead, coach, and develop both the Customer Service (L1) and Service Engineering teams.
- Establish structured onboarding, training, and progression paths to build resilient, high‑performing teams.
- Foster a culture of accountability, service excellence, and continuous improvement across operations.
What You Bring
Essential (Must Have)
- Demonstrable experience owning or operating services where uptime, data integrity, and regulatory compliance are critical.
Required Experience & Skills
- Proven experience as an Operations Manager, Service Manager, or equivalent, with ownership of live services.
- Strong hands‑on experience with ITIL service management practices, particularly Incident, Problem, Change, and Continual Improvement.
- Experience managing Customer Service / L1 support teams in a production environment.
- Working knowledge of Site Reliability Engineering (SRE) principles and operational risk management.
- Strong technical foundation across Azure, Windows Server, Linux (RedHat), Active Directory, networking, and scripting (PowerShell, Bash, or Python).
- Experience delivering platform upgrades and managing production change in cloud and hybrid environments.
- Experience owning operational budgets and cost centres.
- Calm, structured leadership style with a strong focus on uptime, customer impact, deadlines, and service quality.
- A genuine commitment to training, mentoring, and building high‑performing operational teams.
What We Offer
- Private Health and Dental Care (BUPA)
- Free On‑Site Gym
- Free access to Udemy
- Employee Assistance Programme
- Free parking
- Opportunities for professional growth and advancement
- Dynamic and innovative work environment
- Opportunity to make your mark in a high‑growth industry
- A beautiful office in the historic Cheshire town of Knutsford, with easily accessible public transport links to Manchester and Chester
This role is on‑site, based at our office in Knutsford.
Reports Into: Head of Engineering / Platform Services