Job Search and Career Advice Platform

Enable job alerts via email!

Senior Site Reliability Engineer SRE

Paydock Holdings Pty

Remote

GBP 60,000 - GBP 100,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading payments orchestration platform is looking for a Senior Site Reliability Engineer to join their global infrastructure team. This role involves maintaining the health, performance, and scalability of the production environment, using tools like AWS, Terraform, and Docker. Ideal candidates will have over 5 years of cloud experience and proficiency in coding as well as CI/CD pipeline management. This fully remote position promises a competitive compensation package and encourages professional growth in a supportive culture.

Benefits

Work from Anywhere
Continuous learning opportunities
Competitive salary

Qualifications

  • 5+ years of hands-on experience with a major cloud provider, preferably AWS.
  • Deep proficiency with tools like Terraform or CloudFormation.
  • Strong experience with Docker and Kubernetes.
  • Proven ability to build and manage CI/CD pipelines.
  • Hands-on experience with monitoring and logging tools.
  • Proficiency in programming languages like Go, Python, or Bash.
  • Excellent communication skills for remote teamwork.

Responsibilities

  • Design and maintain core infrastructure using IaC principles.
  • Identify and address performance bottlenecks proactively.
  • Implement comprehensive monitoring and logging systems.
  • Participate in on-call rotation and lead incident responses.
  • Collaborate with engineering teams on reliability best practices.
  • Implement security best practices across cloud infrastructure.

Skills

Cloud Experience
Infrastructure as Code (IaC)
Containerization
CI/CD Pipeline Development
Observability Skills
Scripting/Coding Ability
Remote Work Communication

Tools

Terraform
CloudFormation
Docker
Kubernetes
GitLab CI
Jenkins
Prometheus
Grafana
ELK Stack
Job description

Paydock is a leading payments orchestration platform, empowering businesses to manage and scale their payment strategies seamlessly. We provide a single, elegant API to connect to a vast ecosystem of payment gateways and methods, simplifying complexity and unlocking new revenue opportunities for our merchants worldwide. As a geodistributed team, we thrive on asynchronous communication and a culture of ownership, trust, and innovation. We're looking for passionate engineers to help us build and maintain the highly available, scalable, and resilient infrastructure that powers global commerce.

The Role

We are seeking an experienced and proactive Senior Site Reliability Engineer (SRE) to join our global infrastructure team. You will be a guardian of our production environment, responsible for its health, performance, and scalability. Your mission is to apply software engineering principles to solve operational problems, automate everything, and ensure our platform exceeds the reliability expectations of our customers.

You'll work with a talented, distributed team of engineers across different time zones, making your mark on a platform that processes millions of transactions. This role requires a deep passion for eliminating toil, a proactive approach to system stability, and excellent communication skills to thrive in a remote-first environment.

What You’ll Do
  • Architect & Automate: Design, build, and maintain our core infrastructure using Infrastructure as Code (IaC) principles. You'll be instrumental in evolving our CI/CD pipelines to ensure safe, rapid, and reliable releases.
  • Enhance Reliability & Scalability: Proactively identify and address performance bottlenecks, single points of failure, and scalability limits. You'll define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to maintain and improve platform health.
  • Champion Observability: Implement and manage comprehensive monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, ELK Stack) to provide deep insights into system behavior and ensure rapid incident detection.
  • Lead Incident Management: Participate in our on-call rotation, acting as a key player in incident response and resolution. You'll lead blameless post-mortems to identify root causes and implement preventative measures.
  • Collaborate & Empower: Work closely with software engineering teams to foster a culture of reliability. You'll provide guidance on building resilient services, implementing best practices for observability, and improving the developer experience.
  • Secure the Foundation: Implement and maintain security best practices across our cloud infrastructure, ensuring our platform is robust and compliant.
What You’ll Bring
Must-Haves
  • Extensive Cloud Experience: 5+ years of hands-on experience with a major cloud provider, preferably AWS (EC2, S3, RDS, VPC, IAM, etc.).
  • Infrastructure as Code (IaC) Mastery: Deep proficiency with tools like Terraform or CloudFormation to manage infrastructure declaratively.
  • Containerization Expertise: Strong experience with Docker and container orchestration systems like Kubernetes (EKS) or ECS.
  • CI/CD Pipeline Development: Proven ability to build, optimize, and manage CI/CD pipelines using tools like GitLab CI, Jenkins, or CircleCI.
  • Observability Skills: Hands-on experience with modern monitoring and logging tools (e.g., Prometheus, Grafana, Loki, Alertmanager, ELK Stack).
  • Strong Scripting/Coding Ability: Proficiency in at least one programming language, such as Go, Python, or Bash, for automation and tooling.
  • Remote Work Pro: Excellent written and verbal communication skills, with a proven ability to work effectively and asynchronously in a distributed team environment.
Nice-to-Haves
  • Experience in the payments or FinTech industry.
  • Familiarity with service mesh technologies like Istio or Linkerd.
  • Experience with database administration (e.g., PostgreSQL, MySQL).
  • Knowledge of networking, security principles, and compliance standards (e.g., PCI DSS).
Why Join Paydock?
  • Work from Anywhere: Enjoy the flexibility and autonomy of a fully remote, geodistributed team.
  • Make a Global Impact: Build and scale the infrastructure for a platform trusted by businesses worldwide.
  • Culture of Growth: We encourage continuous learning and provide opportunities for professional development in a supportive and collaborative environment.
  • Meaningful Work: Solve complex, interesting problems that have a direct and tangible impact on our product and customers.
  • Competitive Compensation: We offer a competitive salary, benefits package, and the right tools to help you succeed.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

Methodfi

United Kingdom
Remote
GBP 70,000 - 90,000
Full time
30+ days ago
Site Reliability Engineer

Wedo Technology Solutions Ltd.

Greater London
On-site
GBP 100,000 - 125,000
Full time
30+ days ago
Senior SRE — Global, Remote-First Reliability Leader

Paydock Holdings Pty

Manchester
Remote
GBP 60,000 - 100,000
Full time
30+ days ago
Deputy of Reliability, Observability, and Control Lead

Paysend Group Ltd.

Greater London
On-site
GBP 80,000 - 110,000
Full time
30+ days ago
AI Engineer

Paydock Holdings Pty

Manchester
Remote
GBP 60,000 - 80,000
Full time
30+ days ago
Platform Engineer

Methodfi

Greater London
Hybrid
GBP 80,000 - 95,000
Full time
30+ days ago
Principal Site Reliability Engineer

Dubizzle Limited

Greater London
Hybrid
GBP 70,000 - 90,000
Full time
30+ days ago
Site Reliability Engineer

Xceptor

Greater London
On-site
GBP 55,000 - 75,000
Full time
30+ days ago
Lead Site Reliability Engineer

Methodfi

Greater London
Hybrid
GBP 70,000 - 90,000
Full time
30+ days ago
Platform Engineer

Attio Ltd

Greater London
On-site
GBP 80,000 - 95,000
Full time
30+ days ago