Job Search and Career Advice Platform

Enable job alerts via email!

Site reliability engineer

Methodfi

London

On-site

GBP 80,000 - 100,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading cloud technology firm in London is seeking an experienced Site Reliability Engineer to lead the design, implementation, and maintenance of their cloud infrastructure. The ideal candidate will have a minimum of 7 years of hands-on experience in site reliability engineering, strong programming skills in Python, Java, or Go, and expertise in cloud platforms such as AWS and Azure. Competitive salary, generous PTO, and comprehensive medical benefits are offered.

Benefits

Generous PTO
Comprehensive medical and dental insurance
Paid parental leave (12 weeks)
Fertility and family planning support
Competitive pension scheme
Wellness stipend
Learning and development stipend
Company-wide off-sites
Competitive compensation with stock options

Qualifications

  • 7+ years of experience in Site Reliability Engineering.
  • Deep understanding of system architecture and infrastructure design.
  • Strong proficiency in automation and monitoring.

Responsibilities

  • Lead design and maintenance of cloud infrastructure.
  • Automate infrastructure provisioning using Terraform.
  • Develop monitoring systems to identify reliability issues.

Skills

Site Reliability Engineering expertise
Programming (Python, Java, Go)
Cloud platforms (AWS, Azure, GCP)
Containerization (Docker, Kubernetes)
Monitoring tools (Prometheus, Grafana)
System architecture knowledge

Education

Bachelor’s degree in Computer Science or related field

Tools

Terraform
Python
Kubernetes
AWS
GCP
Job description
About this role

We are looking for a foundational member of the Cloud infrastructure team at WRITER. This role will involve contributing to the development and implementation of our Site reliability engineering (SRE) program. The ideal candidate will ensure the reliability, scalability, performance, and security of WRITER’s critical systems, taking a proactive approach to guarantee that our high-ROI products reach our customers seamlessly.

Responsibilities
  • Lead the design, implementation, and maintenance of WRITER, Inc.’s cloud infrastructure to ensure high availability and performance

  • Design and implement scalable cloud automation to support seamless deployment for our largest enterprise customers

  • Automate infrastructure provisioning and management using Terraform & Python

  • Collaborate with development teams to optimize cloud resources and enhance system reliability

  • Develop and maintain monitoring and alerting systems to proactively identify and resolve issues affecting the reliability of our writing solutions

  • Conduct post-mortem analyses of system failures to identify root causes and implement preventive measures

  • Optimize and scale our cloud infrastructure to support growing user demand and ensure cost efficiency

  • Ensure the security and compliance of our systems, adhering to industry standards and regulations

  • Provide mentorship and technical guidance to junior engineers, fostering a culture of reliability and continuous improvement

  • Stay current with emerging technologies and industry trends to continuously improve our site reliability practices

Is this you?
  • Proven expertise in Site Reliability Engineering with a minimum of 7 years of hands-on experience

  • Deep understanding of system architecture and infrastructure design to ensure high availability and performance

  • Bachelor’s degree in Computer Science, Engineering, or a related technical field

  • Strong proficiency in programming languages such as Python, Java, Go for automation and monitoring

  • Experience with cloud platforms like AWS, Azure, or GCP, and their respective services for scalable and resilient systems

  • Expertise in containerization technologies (e.g., Docker, Kubernetes) and orchestration tools

  • Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) to maintain system health and performance

  • Ability to lead and mentor junior engineers in best practices for reliability and system optimization

  • Excellent communication skills to collaborate effectively with cross-functional teams and stakeholders

  • Proactive approach to identifying and mitigating potential system failures and performance bottlenecks

Preferred skills & experience:
  • Software engineering expertise

  • Terraform

  • Python

  • Kubernetes

  • Scala

  • AWS/GCP

Benefits & perks (UK full-time employees):
  • Generous PTO, plus company holidays

  • Comprehensive medical and dental insurance

  • Paid parental leave for all parents (12 weeks)

  • Fertility and family planning support

  • Early-detection cancer testing through Galleri

  • Competitive pension scheme and company contribution

  • Annual work-life stipends for:

    • Home office setup, cell phone, internet

    • Wellness stipend for gym, massage/chiropractor, personal training, etc.

    • Learning and development stipend

  • Company-wide off-sites and team off-sites

  • Competitive compensation and company stock options

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.