
Enable job alerts via email!
Generate a tailored resume in minutes
Land an interview and earn more. Learn more
A technology company in the City of London is seeking an AI Inference Engineer to develop APIs for real-time machine learning inference. You will optimize the inference stack and improve system reliability, with a focus on working with PyTorch and CUDA. Ideal candidates have experience in ML systems and are passionate about LLM optimizations.
We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.
Final offer amounts are determined by multiple factors, including, experience and expertise.
Equity: In addition to the base salary, equity may be part of the total compensation package.