Who We Are
Our business is supercomputing. We’ve been developing, building and supporting highly advanced computing solutions for the world’s most complex science, engineering and analytics challenges since 1976. We pride ourselves on understanding the problems our customers are trying to solve and developing the technologies that enable them to make the discoveries that better our world.
Who You Are
You are a dynamic, driven professional with a passion for success – yours, your company’s and your customer’s. Cray Global Technical Support (GTS) has an immediate opening for a remote service engineer with broad multi-system environment knowledge (generalist) to join our Global Remote Service (GRS) team. Under minimal supervision, this position provides highly visible end-user remote software and hardware technical support on Cray supercomputer, analytics, cluster compute and storage systems.
It is required for this position to work a staggered workweek; such as Sunday – Thursday, Tuesday – Sat. or Saturday-Wednesday.
Responsible for understanding and complying with Cray internal controls.
Primary Duties and Responsibilities:
· Provide remote technical product support to Cray end-users who are diagnosing, troubleshooting, repairing, and debugging complex software, compute, and I/O subsystems via Cray diagnostic and remote support tools and/or telephone.
· Identify Cray system hardware and software as well third-party hardware and software issues. Determine solutions and implement repairs or workarounds. This includes effectively managing the break fix process by applying updates and patches, or initiating spares parts orders, arranging for on-site engineering support (as required) and managing open RMAs.
· Resolve incidents within a defined time-frame using standard processes and managing a queue of cases, bugs, and projects.
· Document all significant events related to customer problems and providing timely updates to customers and management.
· Develop, demonstrate and maintain technical skills including troubleshooting, data analysis, code debugging, test scenario creation, and testing.
· Work with various other Cray teams, including but not limited to; GRS peers, Cray Level 2, Cray Level 1 Site-Field, Publications, Training, Support Planning, Testing and Cray R&D.
· Author and review knowledge articles, field notices and patch requests.
· Participate in occasional customer installations, upgrades and training (remote and on-site).
· Provide on-call services on a rotation basis.
Minimum Education and/or Experience:
· Bachelor’s degree in Computer Science, Engineering or related field/discipline
· 5+ years of experience, ideally in a High-Performance Compute (HPC) – related area
Required Knowledge, Skills and Abilities:
· Decision quality demonstrated through excellent troubleshooting skills (software and hardware) and taking an analytical approach to problems and driving solutions to problems through to their conclusion.
· Knowledge and experience of Linux/Unix operating systems, file systems, networking and security
· Programming and scripting knowledge and experience (e.g. Bash, Perl, Python, etc.)
· Familiarity with Lustre or other parallel filesystems.
· Ability to gather data, perform analysis, document findings and escalate to a higher level of support while remaining engaged in the final outcome.
· Knowledge of and experience in maintaining system hardware and software, utilizing diagnostic tools and debugging tools for problem isolation. Performs software builds, software upgrades, patch installation and hardware repairs (swapping boards, etc.) as needed.
· Action oriented; candidates must demonstrate self-motivation, be able to coordinate efforts with other groups, including: customers, peers, field personnel, hardware product support, R&D, and 3rd-party vendor personnel.
· Very good communication skills, both verbal and written.
· Customer focus to meet the expectations and requirements of internal and external customers by building effective, respectful and trusting relationships and uses first-hand information to help improve products and services.
· Maintains composure by remaining cool under pressure, does not become defensive, can be counted on in tough times, handles stress and the unexpected, provides a settling influence while working to strict deadlines.
· Time management: Uses time effectively and efficiently, concentrates on important priorities, can attend to a broad range of activities
· Some travel may be required periodically.
· Ability to work effectively as part of a global team environment to investigate and resolve complex problems.
Additional desired skills:
· Acquaintance with specific needs of HPC users desired
· Networking skills (Omni Path, InfiniBand) a plus
· Familiarity working with Containers (Docker, Shifter) desired
· Working experience with Kubernetes, RESTful APIs desired
· Rotation of on-call duties with other staff to provide 24x7 coverage
· Occasional travel for factory or field training or to help support remote systems is required
· Ability to lift up to 50 pounds (22.6 Kilograms) overhead
We are proud to be an Equal Opportunity Employer including protected veterans and individuals with disabilities. CRAY Inc. is an Affirmative Action, Equal Opportunity Employer. As part of our standard hiring process for new employees, employment with CRAY will be contingent upon successful completion of a comprehensive background check.