Enable job alerts via email!

AI Inference Engineer (London)

Methodfi

City of London

On-site

GBP 80,000 - 100,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A technology company in the City of London is seeking an AI Inference Engineer to develop APIs for real-time machine learning inference. You will optimize the inference stack and improve system reliability, with a focus on working with PyTorch and CUDA. Ideal candidates have experience in ML systems and are passionate about LLM optimizations.

Benefits

Equity options

Qualifications

Experience with machine learning systems and deep learning frameworks like PyTorch.
Familiarity with LLM architectures and optimization techniques.
Understanding of GPU architectures or experience with CUDA.

Responsibilities

Develop APIs for AI inference for both internal and external customers.
Benchmark and address bottlenecks in the inference stack.
Improve reliability and observability of systems, responding to outages.
Explore and implement optimizations for LLM inference.

Skills

Experience with ML systems

Deep learning frameworks

Inference optimization techniques

GPU architectures

CUDA programming

Tools

Python

Rust

C++

PyTorch

Triton

CUDA

Kubernetes

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

Develop APIs for AI inference that will be used by both internal and external customers
Benchmark and address bottlenecks throughout our inference stack
Improve the reliability and observability of our systems and respond to system outages
Explore novel research and implement LLM inference optimizations

Qualifications

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Final offer amounts are determined by multiple factors, including, experience and expertise.

Equity: In addition to the base salary, equity may be part of the total compensation package.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs