Job Search and Career Advice Platform

Enable job alerts via email!

AI Inference Engineer (London)

Methodfi

City of London

On-site

GBP 80,000 - 100,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A technology company in the City of London is seeking an AI Inference Engineer to develop APIs for real-time machine learning inference. You will optimize the inference stack and improve system reliability, with a focus on working with PyTorch and CUDA. Ideal candidates have experience in ML systems and are passionate about LLM optimizations.

Benefits

Equity options

Qualifications

  • Experience with machine learning systems and deep learning frameworks like PyTorch.
  • Familiarity with LLM architectures and optimization techniques.
  • Understanding of GPU architectures or experience with CUDA.

Responsibilities

  • Develop APIs for AI inference for both internal and external customers.
  • Benchmark and address bottlenecks in the inference stack.
  • Improve reliability and observability of systems, responding to outages.
  • Explore and implement optimizations for LLM inference.

Skills

Experience with ML systems
Deep learning frameworks
Inference optimization techniques
GPU architectures
CUDA programming

Tools

Python
Rust
C++
PyTorch
Triton
CUDA
Kubernetes
Job description

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities
  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations
Qualifications
  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Final offer amounts are determined by multiple factors, including, experience and expertise.

Equity: In addition to the base salary, equity may be part of the total compensation package.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.