Enable job alerts via email!

Site Reliability Engineer

NewsNowGh

Cambridge

On-site

GBP 80,000 - GBP 100,000

Full time

24 days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A pioneering AI company is seeking a highly experienced Site Reliability Engineer to enhance the reliability and scalability of its AI platform. This role, based in England, offers visa sponsorship for international professionals. You will design and maintain resilient infrastructure, collaborate with software engineers, and drive improvements in monitoring and operations. Ideal candidates have a Master's degree and 7+ years in SRE or DevOps, with strong skills in cloud platforms and infrastructure tools. This is an excellent opportunity to work in a leading AI firm with global impact.

Qualifications

7+ years of experience in SRE, DevOps, or similar roles in distributed systems environments.
Hands-on experience with Docker, Kubernetes, CI/CD pipelines, and infrastructure-as-code tools.
Solid knowledge of observability stacks, networking, security, and system administration.

Responsibilities

Design, build, and maintain scalable, highly available, and fault-tolerant infrastructure.
Ensure high availability of inference and training environments across HPC clusters.
Implement and improve monitoring, alerting, logging, and incident management systems.
Drive infrastructure-as-code, deployment, and orchestration.
Work with security teams to ensure compliance with best practices.

Skills

Cloud platforms

Reliability engineering practices

Docker

Kubernetes

CI/CD pipelines

Scripting or programming (Python, Go, Bash)

Observability stacks

Networking

System administration

Education

Master’s degree in Computer Science, Engineering, or a related field

Tools

Terraform

Site Reliability Engineer Job in UK 2026 with Visa Sponsorship | Mistral AI

Mistral AI is hiring a highly experienced Site Reliability Engineer (SRE) to strengthen the reliability, scalability, and performance of its cutting-edge AI platform and customer-facing systems. This role is based in London, England, with a strong European presence and flexible arrangements for eligible candidates.

The position is open to international professionals, with Skilled Worker visa sponsorship available, making it an exceptional opportunity for senior engineers seeking to build a long-term career in the UK’s fast-growing artificial intelligence sector. You will join a world-class team working at the frontier of open, high-performance AI infrastructure.

About Role

As a Site Reliability Engineer, you will operate at the intersection of software engineering and production operations, balancing day-to-day reliability with long-term platform improvements. The role combines hands-on operations with infrastructure and platform engineering, supporting both customer-facing services and large-scale AI model training environments.

You will work closely with software engineers, security teams, and AI researchers to ensure systems are highly available, secure, reproducible, and scalable across multiple environments and high-performance computing clusters.

About Hiring Firm

Mistral AI is a pioneering AI company focused on democratizing artificial intelligence through high-performance, optimized, and open models and platforms. Its products are designed to integrate seamlessly into enterprise and research environments, both on-premises and in the cloud. With teams across Europe, the UK, the USA, and Asia, Mistral AI is known for its collaborative, low-ego, and innovation-driven culture.

The company is building the next generation of AI infrastructure and tools that are already shaping how organisations deploy and use advanced AI systems.

Responsibilities

Design, build, and maintain scalable, highly available, and fault-tolerant infrastructure for web services and ML workloads
Ensure high availability of inference and training environments and enable replication across HPC clusters
Operate and troubleshoot production systems, including incident response and root cause analysis
Implement and improve monitoring, alerting, logging, and incident management systems
Build and maintain CI/CD, containerisation, orchestration, and automation workflows
Drive infrastructure-as-code, deployment, and orchestration using tools such as Kubernetes and Terraform
Collaborate with researchers to enable safe, reproducible model training and experimentation
Develop new tooling, dashboards, and workflows to improve reliability, performance, and operability
Work with security teams to ensure compliance with best practices and standards
Document systems, processes, and contribute to knowledge sharing and open-source initiatives

Requirements

Master’s degree in Computer Science, Engineering, or a related field
7+ years of experience in SRE, DevOps, or similar roles in distributed systems environments
Strong experience with cloud platforms, highly available systems, and reliability engineering practices
Hands-on experience with Docker, Kubernetes, CI/CD pipelines, and infrastructure-as-code tools
Proficiency in scripting or programming (e.g., Python, Go, Bash)
Solid knowledge of observability stacks, networking, security, and system administration
Excellent problem-solving skills and ability to work in fast-paced, high-impact environments

This is a rare opportunity to join one of Europe’s most exciting AI companies with visa sponsorship, strong benefits, and global impact. If you are a senior SRE looking to relocate to the UK or grow your international career while working on world-class AI infrastructure, Mistral AI offers a truly exceptional platform for your next career move.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior SRE - AI Infra, UK Visa Sponsorship, London

NewsNowGh

Cambridge

On-site

GBP 80,000 - 100,000

Full time

30+ days ago

Software Engineer, Backend (London)

BlackCube Labs

Greater London

Hybrid

GBP 80,000 - 100,000

Full time

30+ days ago

Partner Manager SI - EMEA

AnaVation LLC

City of London

On-site

GBP 70,000 - 90,000

Full time

30+ days ago

Infrastructure Solution Architect - EMEA

BlackCube Labs

Greater London

On-site

GBP 70,000 - 100,000

Full time

30+ days ago

Infrastructure Deployment Architect - EMEA

BlackCube Labs

Greater London

On-site

GBP 60,000 - 100,000

Full time

30+ days ago

Site Reliability Engineer

Wedo Technology Solutions Ltd.

Greater London

On-site

GBP 100,000 - 125,000

Full time

30+ days ago

Software Engineer – UK Visa Sponsorship Available

EasyInfoBlog.com LLC

Wolverhampton

On-site

GBP 93,000 - 169,000

Full time

30+ days ago

Site Reliability Engineering Lead

IQVIA

Greater London

On-site

GBP 80,000 - 100,000

Full time

30+ days ago

Director of Sales Enablement

AnaVation LLC

Greater London

Hybrid

GBP 100,000 - 125,000

Full time

30+ days ago

Engineering Site Lead (London)

Methodfi

City of London

Hybrid

GBP 100,000 - 130,000

Full time

30+ days ago

Top locations

Top companies

Top positions