Job Search and Career Advice Platform

Enable job alerts via email!

Site Reliability Engineer

NewsNowGh

Cambridge

On-site

GBP 80,000 - GBP 100,000

Full time

24 days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A pioneering AI company is seeking a highly experienced Site Reliability Engineer to enhance the reliability and scalability of its AI platform. This role, based in England, offers visa sponsorship for international professionals. You will design and maintain resilient infrastructure, collaborate with software engineers, and drive improvements in monitoring and operations. Ideal candidates have a Master's degree and 7+ years in SRE or DevOps, with strong skills in cloud platforms and infrastructure tools. This is an excellent opportunity to work in a leading AI firm with global impact.

Qualifications

  • 7+ years of experience in SRE, DevOps, or similar roles in distributed systems environments.
  • Hands-on experience with Docker, Kubernetes, CI/CD pipelines, and infrastructure-as-code tools.
  • Solid knowledge of observability stacks, networking, security, and system administration.

Responsibilities

  • Design, build, and maintain scalable, highly available, and fault-tolerant infrastructure.
  • Ensure high availability of inference and training environments across HPC clusters.
  • Implement and improve monitoring, alerting, logging, and incident management systems.
  • Drive infrastructure-as-code, deployment, and orchestration.
  • Work with security teams to ensure compliance with best practices.

Skills

Cloud platforms
Reliability engineering practices
Docker
Kubernetes
CI/CD pipelines
Scripting or programming (Python, Go, Bash)
Observability stacks
Networking
System administration

Education

Master’s degree in Computer Science, Engineering, or a related field

Tools

Terraform
Job description
Site Reliability Engineer Job in UK 2026 with Visa Sponsorship | Mistral AI
Site Reliability Engineer Job in UK 2026 with Visa Sponsorship | Mistral AI

Mistral AI is hiring a highly experienced Site Reliability Engineer (SRE) to strengthen the reliability, scalability, and performance of its cutting-edge AI platform and customer-facing systems. This role is based in London, England, with a strong European presence and flexible arrangements for eligible candidates.

The position is open to international professionals, with Skilled Worker visa sponsorship available, making it an exceptional opportunity for senior engineers seeking to build a long-term career in the UK’s fast-growing artificial intelligence sector. You will join a world-class team working at the frontier of open, high-performance AI infrastructure.

About Role

As a Site Reliability Engineer, you will operate at the intersection of software engineering and production operations, balancing day-to-day reliability with long-term platform improvements. The role combines hands-on operations with infrastructure and platform engineering, supporting both customer-facing services and large-scale AI model training environments.

You will work closely with software engineers, security teams, and AI researchers to ensure systems are highly available, secure, reproducible, and scalable across multiple environments and high-performance computing clusters.

About Hiring Firm

Mistral AI is a pioneering AI company focused on democratizing artificial intelligence through high-performance, optimized, and open models and platforms. Its products are designed to integrate seamlessly into enterprise and research environments, both on-premises and in the cloud. With teams across Europe, the UK, the USA, and Asia, Mistral AI is known for its collaborative, low-ego, and innovation-driven culture.

The company is building the next generation of AI infrastructure and tools that are already shaping how organisations deploy and use advanced AI systems.

Responsibilities
  • Design, build, and maintain scalable, highly available, and fault-tolerant infrastructure for web services and ML workloads
  • Ensure high availability of inference and training environments and enable replication across HPC clusters
  • Operate and troubleshoot production systems, including incident response and root cause analysis
  • Implement and improve monitoring, alerting, logging, and incident management systems
  • Build and maintain CI/CD, containerisation, orchestration, and automation workflows
  • Drive infrastructure-as-code, deployment, and orchestration using tools such as Kubernetes and Terraform
  • Collaborate with researchers to enable safe, reproducible model training and experimentation
  • Develop new tooling, dashboards, and workflows to improve reliability, performance, and operability
  • Work with security teams to ensure compliance with best practices and standards
  • Document systems, processes, and contribute to knowledge sharing and open-source initiatives
Requirements
  • Master’s degree in Computer Science, Engineering, or a related field
  • 7+ years of experience in SRE, DevOps, or similar roles in distributed systems environments
  • Strong experience with cloud platforms, highly available systems, and reliability engineering practices
  • Hands-on experience with Docker, Kubernetes, CI/CD pipelines, and infrastructure-as-code tools
  • Proficiency in scripting or programming (e.g., Python, Go, Bash)
  • Solid knowledge of observability stacks, networking, security, and system administration
  • Excellent problem-solving skills and ability to work in fast-paced, high-impact environments

This is a rare opportunity to join one of Europe’s most exciting AI companies with visa sponsorship, strong benefits, and global impact. If you are a senior SRE looking to relocate to the UK or grow your international career while working on world-class AI infrastructure, Mistral AI offers a truly exceptional platform for your next career move.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior SRE - AI Infra, UK Visa Sponsorship, London

NewsNowGh

Cambridge
On-site
GBP 80,000 - 100,000
Full time
30+ days ago
Software Engineer, Backend (London)

BlackCube Labs

Greater London
Hybrid
GBP 80,000 - 100,000
Full time
30+ days ago
Partner Manager SI - EMEA

AnaVation LLC

City of London
On-site
GBP 70,000 - 90,000
Full time
30+ days ago
Infrastructure Solution Architect - EMEA

BlackCube Labs

Greater London
On-site
GBP 70,000 - 100,000
Full time
30+ days ago
Infrastructure Deployment Architect - EMEA

BlackCube Labs

Greater London
On-site
GBP 60,000 - 100,000
Full time
30+ days ago
Site Reliability Engineer

Wedo Technology Solutions Ltd.

Greater London
On-site
GBP 100,000 - 125,000
Full time
30+ days ago
Software Engineer – UK Visa Sponsorship Available

EasyInfoBlog.com LLC

Wolverhampton
On-site
GBP 93,000 - 169,000
Full time
30+ days ago
Site Reliability Engineering Lead

IQVIA

Greater London
On-site
GBP 80,000 - 100,000
Full time
30+ days ago
Director of Sales Enablement

AnaVation LLC

Greater London
Hybrid
GBP 100,000 - 125,000
Full time
30+ days ago
Engineering Site Lead (London)

Methodfi

City of London
Hybrid
GBP 100,000 - 130,000
Full time
30+ days ago