← Back to all jobs
DDN

Sr. Staff Engineer, Lustre

DDN

14h ago

No Phone Required$215k - $265kDevUnited Stateshimalayas
Staff-EngineeringLustreFS-DevelopmentLinux-Kernel-EngineeringDistributed-Storage-EngineeringHPC-Infrastructure-EngineeringSenior-Staff-EngineerSr.-Staff-Software-EngineerSenior-Staff-Software-EngineerStaff-EngineerSr.-Software-EngineerSenior

Job Description

OverviewThis is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.  "DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC “The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA  DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.   Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.   Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage. Job DescriptionWe are seeking a Senior Staff Engineer – LustreFS with 15+ years of experience in distributed storage, Linux kernel and large-scale HPC/AI infrastructure. This role is intended for a deeply hands-on technical leader who can independently drive architecture, debugging, reliability and performance across LustreFS subsystems including metadata, object storage, recovery, LNet and high-performance transports such as RDMA/InfiniBand/RoCE. You will be expected to mentor senior engineers, shape technical direction, improve operational resilience, and help convert tribal knowledge into scalable engineering systems. Being AI-enabled for faster triage, debugging, design exploration and knowledge capture is a strong advantage.Key ResponsibilitiesProvide deep technical leadership across LustreFS subsystems including llite, MDS/MDT, OSS/OST, LDLM, recovery and LNet.Own complex root-cause analysis for difficult customer, scale and production issues across kernel, filesystem, network and transport layers.Lead design and implementation of new features, reliability improvements, scale enhancements and performance optimizations in LustreFS.Drive architectural reviews for kernel-space and user-space changes with strong attention to correctness, backward compatibility and operability.Define debugging and observability strategies for complex distributed failure scenarios including failover, recovery storms, lock contention and transport degradation.Partner with principal engineers, support, QE, DevOps and release teams to improve product quality, test depth and release confidence.Mentor senior and mid-level engineers; create structured learning paths, review standards and subsystem ownership models to build redundancy.Promote use of AI-assisted workflows for issue triage, log analysis, code review assistance, knowledge capture and design acceleration with appropriate engineering guardrails.Required Qualifications15+ years of experience in distributed systems, filesystems, Linux kernel development or storage infrastructure engineering.Strong hands-on expertise in LustreFS internals and production operations, including one or more of: metadata services, object storage services, client/llite, locking, recovery or LNet.Strong C systems programming skills and deep Linux debugging experience using tools such as gdb, crash, perf, ftrace, eBPF, systemtap and core analysis.Strong understanding of Linux kernel concurrency, memory management, I/O paths, networking and performance tuning.Experience with high-performance networking and transports such as InfiniBand, RDMA, RoCE and/or TCP at scale.Proven ability to diagnose complex cross-layer issues spanning kernel, storage, networking and distributed coordination.Experience leading design discussions, code reviews and subsystem-level technical decisions.Excellent written and verbal communication skills with the ability to guide senior technical audiences and influence cross-functional teams.Preferred SkillsExperience with large-scale AI/HPC clusters, parallel filesystems and performance-sensitive production environments.Familiarity with backend storage filesystems and media such as ZFS, ldiskfs, NVMe and enterprise storage platforms.Experience with upstream/open-source contribution models, patch review and long-term maintenance / backporting.Experience building run