4 months ago
Sr Member Eng Stf - HPC System Engineer
Job ID: 460269BR Date posted: Nov. 07, City: Moorestown State: New Jersey
Description:At Lockheed Martin Rotary and Mission Systems, we are driven by innovation and integrity. We believe that by applying the highest standards of business ethics and visionary thinking, everything is within our reach – and yours as a Lockheed Martin employee. Lockheed Martin values your skills, training and education. Come and experience your future!
Infrastructure Engineering & Lab Operations is engaged in the design and deployment of High Performance Computing (HPC) platforms used for Machine Learning research, Monte Carlo simulation, data analysis/analytics applications for government customer. We have a passion for excellence that is reflected in both the quality of our products and the services we provide our customers. Infrastructure Engineering & Lab Operations has successfully worked with HPC vendors that include Penguin Computing, Dell, Nvidia, Intel, Cisco, Univa and Mellanox to deliver:
• Linux based Beowulf clusters ranging from hundreds to thousands of CPU cores
• Tesla V100 & P100 GP-GPU enabled compute systems for Deep learning / Machine Learning and Scientific Applications
• Petabytes of storage
• High-performance networks
Infrastructure Engineering & Lab Operations is currently seeking HPC Professional with Linux Systems Administration experience and a programming background to join our Development Systems Integration Group (DSI). We’re also interested in hearing from Cyber-Security experts who are familiar with HPC, and have worked in DevOps settings.
The Infrastructure Engineering & Lab Operations DSI defines new computing platforms, adapt new methodologies, and creates tools/scripts for improving the HPC User Experience. This team also designs, integrates and supports HPC cluster operations as well as maintain Cybersecurity posture by implementing Risk Management Framework (RMF). Our HPC clusters incorporate Intel Broadwell and Skylake processors, NVIDIA Tesla GP-GPUs, parallel and clustered tiered storage, Univa distributed resource management system, Fibre channel, InfiniBand, Giga-bit Ethernet, accelerated graphics and Red Hat Enterprise Linux (RHEL).
- Bachelor’s degree in Computer Science, Data Science, Engineering or related fields with scientific computing experience
- 6+ years’ experience in IT including experience with Red Hat Enterprise Linux (RHEL) administration, HPC administration in a product engineering environment, and clustered file systems [such as Vertitas] or experience w/parallel file systems [such as Lustre].
-High level of hands-on experience in managing, architecting and administering large CPU and GP-GPU based HPC’s platforms
- Experience in a design engineering environment working on tight schedules
- Experience performing Security patching across multiple Unix platforms
- Experience with virtualization technology such as VMware.
- Experience with identity management technology such as Active Directory, Kerberos, and LDAP
- Experience configuring, installing and troubleshooting Univa GRID Engine (preferred) or other job schedulers/resource managers.
- Experience configuring and managing network-attached storage systems, such as RAID arrays or ZFS pools, high speed disk/SSD/NVMe systems, and storage networks
- Linux image capturing and deploying for efficient system buildouts
- Demonstrated ability to manage the full stack (datacenter rack equipment, server hardware, OS, network, and security) of multi-tenant Linux- based systems both individually and within a team environment
- Experience scripting and automating tasks, using tools such as Python and bash
- Must be a US citizen capable of ascertaining a DoD Secret security clearance
- Master’s degree in Computer Science, Data Science, Engineering or related fields with scientific computing experience
- Developing, optimizing, compiling, implementing, and testing multithreaded, multiprocessor performance-oriented software with Message Passing Interface, OpenMP, CUDA or other parallel processing frameworks
- Electromagnetics, fluid dynamics, multi-physics Finite Element Analysis, Monte Carlo Analysis, generative design, control theory, optimization, directed energy and/or other physics-based modeling and simulation
- Artificial Intelligence technologies, such as general machine learning algorithms and neural networks in a parallel HPC environment
- Interface, configure, and optimization of HPC technologies such as parallel/distributed files systems [Lustre], high speed interconnect fabrics [Infiniband], and HPC batch scheduling software [Univa]
- Advance knowledge of RHEL including Secure Linux [SELinux] as well as Multi-Level Security (MLS) tagging or labeling technologies.
- Understand complex engineering and modeling principles to effectively communicate with user community, understand their work, and translate requirements into HPC solutions
- Experience with systems automation tools such as Ansible or Puppet
- Hold a current Cybersecurity certification in Security+ and/or CISSP
- Excellent collaboration and team-oriented skills
- Excellent oral and written communication skills
- US Citizen and be able to obtain a Top Secret DoD Clearance
Lockheed Martin is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, gender identity, national origin, age, protected veteran status, or disability status. Join us at Lockheed Martin, where your mission is ours. Our customers tackle the hardest missions. Those that demand extraordinary amounts of courage, resilience and precision. They’re dangerous. Critical. Sometimes they even provide an opportunity to change the world and save lives. Those are the missions we care about.
As a leading technology innovation company, Lockheed Martin’s vast team works with partners around the world to bring proven performance to our customers’ toughest challenges. Lockheed Martin has employees based in many states throughout the U.S., and Internationally, with business locations in many nations and territories.
Experience Level: Experienced Professional Business Unit: ESS6500 RMS Relocation Available: Possible Career Area: Information Technology Clearance Level: Secret Type: Full-Time Virtual Location: no Work Schedule: TEMPO: 5X8 - 5 days/wk 8 hrs/day (Flex & Rigid) Shift: First