Job Search and Career Advice Platform

Enable job alerts via email!

Senior HPC Storage Systems Engineer

Xceleng

Oak Ridge (TN)

On-site

USD 100,000 - 130,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A technology solutions company based in Oak Ridge, Tennessee is seeking a Senior HPC Storage Systems Engineer. The successful candidate will design, operate, and maintain large-scale HPC storage systems to support scientific research. With a focus on managing UNIX/Linux systems and ensuring the performance of storage environments, this mid-senior level position requires strong governance of storage technologies and collaboration with researchers. Full-time position with competitive challenges in a dynamic environment.

Qualifications

  • 8-12 years of experience in a relevant field or equivalent education.
  • 5+ years managing UNIX/Linux systems.
  • Experience with HPC storage and large-scale enterprise storage systems.
  • 3+ years working with configuration management and automation tools.
  • Strong troubleshooting skills in Linux environments.

Responsibilities

  • Architect and manage HPC storage systems including parallel file systems.
  • Design and optimize large-scale Ceph storage clusters.
  • Ensure performance, scalability, and security of storage environments.
  • Collaborate with researchers for data workflows and I/O performance.
  • Evaluate new storage technologies for future HPC systems.

Skills

UNIX/Linux management
HPC storage management
Configuration management tools (Git, Jenkins, Ansible)
Scripting (Bash, Python, Perl)
Linux administration

Education

BS degree in computer science or related
Master's degree in relevant field
PhD in relevant field

Tools

Lustre
Ceph
Qumulo
NetApp
Spectra Logic BlackPearl
Job description
Overview

Xcel Engineering, Inc. is seeking a Senior HPC Storage Systems Engineer to design, operate and maintain clusters, servers, and workstations storage supporting services where science happens at ORNL! This position resides in the Emerging Technologies and Computing team in the Research Computing group in the Information Technology Services Directorate at Oak Ridge National Laboratory (ORNL).

The Emerging Technology Computational Group facilitates goals through HPC systems engineering, integration, and support for the research community. By providing design, deployment, optimization, monitoring, and tooling support across multiple clustered storage infrastructures, we facilitate Lab-wide RandD projects. Our HPC clusters range in scope from just a handful of nodes to over fifty-thousand cores.

We partner with ORNL research organizations to enable research excellence and delivery. We work with other clustered computing and HPC groups to help research programs identify the best solutions for their needs. When we build our customer\'s environments, our team collaborates to design, implement, and maintain the systems from inception to retirement.

Essential Functions
  • Architect, deploy, and manage large-scale HPC storage systems, including parallel file systems such as Lustre, GPFS/Spectrum Scale, BeeGFS and WEKA
  • Design, implement, and operate large-scale Ceph storage clusters for HPC and research workloads, delivering reliable, high-performance object, block, and file storage services.
  • Ensure the availability, performance, scalability, and security of production storage environments.
  • Administer and optimize enterprise storage platforms such as Qumulo and NetApp in support of HPC and research workloads.
  • Design, deploy, and maintain archival storage solutions including Spectra Logic BlackPearl and large-scale tape libraries to ensure long-term data preservation and accessibility.
  • Integrate high-performance, enterprise, and archival storage layers into cohesive tiered storage architectures that balance cost, scalability, and performance for diverse scientific workflows.
  • Leverage automation and monitoring solutions to minimize day-to-day maintenance while identifying opportunities to optimize system performance and management.
  • Collaborate with researchers and technical POCs to support large data workflows and optimize I/O performance for scientific workloads.
  • Automate storage provisioning, monitoring, and maintenance using scripting and configuration management tools.
  • Diagnose and resolve complex storage and I/O-related issues in high-throughput, low-latency HPC environments.
  • Evaluate emerging storage technologies (NVMe, object storage, hierarchical storage management, burst buffers) and contribute to strategic planning for future HPC systems.
  • Work with 24/7 operations staff to streamline monitoring and troubleshooting, significantly reducing the need for off-hours support.
  • Deliver ORNL\'s mission by aligning behaviors, priorities, and interactions with our core values of Impact, Integrity, Teamwork, Safety, and Service. Promote equal opportunity by fostering a respectful workplace.
Basic Qualifications

A BS degree in computer science, computer engineering, information technology, information systems, science, engineering, or related discipline and 8-12 years of relevant professional experience; or an equivalent combination of education and experience.

  • Master\'s degree holders: 7-10 years of relevant experience.
  • PhD holders: 4-6 years of relevant experience.

Five (5) or more years managing UNIX/Linux systems.

Demonstrated experience managing HPC storage and large-scale enterprise storage systems.

Three (3) or more years working with configuration management and automation tools such as Git, Jenkins, Ansible, or Puppet.

Proficiency with at least one scripting language (Bash, Python, Perl, etc.).

Strong Linux administration and advanced troubleshooting experience.

Experience supporting large data systems and/or HPC scientific workloads.

Strong desire to innovate and evaluate new technologies for HPC and storage environments.

Seniority level
  • Mid-Senior level
Employment type
  • Full-time
Job function
  • Information Technology
  • Industries
  • IT Services and IT Consulting

Referrals increase your chances of interviewing at Xcel Engineering by 2x

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.