*** This role is supporting NVIDIA ***
*** Role is on-site and will require local travel to Data Centers within Santa Clara.
We're looking for a motivated Engineering Technician for NVIDIA's on-premises, private cloud infrastructure! In this role, you will be faced with the challenge of providing and maintaining a compute farm of systems, which includes Builders, Packagers, and Testers that act as a test-bed for our developers worldwide to test various NVIDIA hardware and software before release.
What We Need to See:
• Associate’s or Bachelor’s Degree in Engineering/Technical Major (or equivalent experience).
• 5+ years of experience in data centers or large engineering labs.
• Familiarity with SCMs like GIT/Perforce.
• Proficiency in DCIM (Nautobot, etc.) and scripting (shell, Python, Ansible).
• Working knowledge of protocols/services like TCP/IP, DNS, NFS, SSL, etc.
• Experience with Windows, Linux, and Mac operating systems.
• Hands-on experience with PCBs, GPUs, and system deployments.
• Exceptional communication skills, both written and verbal.
• Ability to explain technical concepts to non-technical audiences.
• Strong problem-solving skills and a collaborative spirit.
What Makes You Stand Out:
• Experience managing HPC clusters using tools like BCM and Slurm
• Hands-on knowledge of OpenStack.
• Relevant certifications such as CCNA or equivalent.
• Strong background in Windows and Linux administration, with an understanding of dense datacenter design, including compute, storage, and networking.
• Experience with hypervisors and VM applications.
• Knowledge of DC infrastructure with an emphasis on liquid cooling.
• A track record of technical curiosity and innovation.
• Mechanically inclined and comfortable with tools and physical tasks.
• Energetic, enthusiastic, and understanding of what it takes to get the team to the finish line.
• Willing to go the extra mile to get the job done
What You'll Do:
• Collaborate closely with engineering teams (system architects, hardware/software engineers, QA, and more) to design, develop, debug, and release next-generation products.
• Manage and maintain a high-performing Compute Farm of builders, packagers, testers, and core infrastructure.
• Ensure availability targets are consistently met and lead system recovery efforts.
• Deploy and qualify systems while supporting exciting new technology bring-ups.
• Oversee inventory and lifecycle management for NVIDIA's assets across data centers and labs.
• Gather critical metrics and create Standard Operating Procedures (SOPs) documentation.
• Maintain a world-class, safe, and well-organized environment in our data centers and labs.
• Troubleshoot Linux/Windows, hardware, and infrastructure issues alongside engineers and platform operations teams.
• Plan, deploy, and maintain on-premises private cloud infrastructure, collaborating with datacenter and network engineering teams.
• Implement efficiency improvements to maximize availability, throughput, and test accuracy while meeting SLAs and KPIs.
• Represent the team in meetings with internal stakeholders and contribute to global operations.
- **Only those lawfully authorized to work in the designated country associated with the position will be considered.**
- **Please note that all Position start dates and duration are estimates and may be reduced or lengthened based upon a client’s business needs and requirements.**
Rose is an assembly of people grounded in honesty, truth and dignity for all of its employees and contractors.
Samba, Consultant
Your team at Rose International is always very helpful and responsive.
Barbara, Consultant
Rose International maintained good communication during assignments and are very informative through email and phone calls.
Sade, Consultant
I am very happy with the Rose International, and the professionalism of the employees.
Robin, Consultant
I have been very pleased with my experience with Rose International. Everyone that I encountered was very helpful and courteous.
Stephanie, Consultant
EMPLOYEE COMMENTS