Senior HPC AI Cluster Engineer
hace 1 mes
NVIDIA is looking for an experienced HPC Engineer to join the E2E software verification HPC/AI Infrastructure team. We are focused on building supercomputers and HPC clusters based on groundbreaking technologies. We are looking for an outstanding architect for a senior HPC role, to be a key player in the most exciting computing hardware and software and contribute to the latest breakthroughs in artificial intelligence and GPU computing. You will provide insights on at-scale system design and tuning mechanisms for large-scale compute runs. You will work with the latest accelerated computing and deep learning software and hardware platforms, and with many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. You will interact with HPC, OS, GPU compute, and systems specialists to architect, develop, and bring up large scale performance platforms.What you will be doing:Design, implement and maintain large scale HPC/AI clusters with monitoring, logging and alerting.Manage Linux job/workload schedules and orchestration tools.Develop and maintain continuous integration and delivery pipelines.Develop tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources.Deploy monitoring solutions for the servers, network and storage.Perform troubleshooting from bare metal, operating system, software stack, and application level.Being a technical resource, develop, re-define and document standard methodologies to share with internal teams.Support Research & Development activities and engage in POCs/POVs for future improvements.What we need to see:A degree in Computer Science, Engineering, or a related field and 5+ years of experience.Knowledge of HPC and AI solution technologies from CPUs and GPUs to high speed interconnects and supporting software.Experience with job scheduling workloads and orchestration tools such as Slurm, K8s.Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking and internals, ACLs and OS level security protection, and common protocols e.g. TCP, DHCP, DNS, etc.Experience with multiple storage solutions such as Lustre, GPFS, zfs, and xfs. Familiarity with newer and emerging storage technologies.Python programming and bash scripting experience.Comfortable with automation and configuration management tools such as Jenkins, Ansible, Puppet/Chef.Deep knowledge of Networking Protocols like InfiniBand and Ethernet.Deep understanding and experience with virtual systems (for example VMware, Hyper-V, KVM, or Citrix).Familiarity with cloud computing platforms (e.g. AWS, Azure, Google Cloud).Ways to stand out from the crowd:Knowledge of CPU and/or GPU architecture.Knowledge of Kubernetes, container related microservice technologies.Experience with GPU-focused hardware/software (DGX, Cuda).Background with RDMA (InfiniBand or RoCE) fabrics.We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.
#J-18808-Ljbffr
-
Senior HPC AI Cluster Engineer
hace 1 mes
España NVIDIA A tiempo completoNVIDIA is looking for an experienced HPC Engineer to join the E2E software verification HPC/AI Infrastructure team. We are focused on building supercomputers and HPC clusters based on groundbreaking technologies. We are looking for an outstanding architect for a senior HPC role, to be a key player in the most exciting computing hardware and software to...
-
Administrador/a Sistemas HPC
hace 1 mes
España AVANSEL SELECCIÓN A tiempo completoAVANSEL SELECCIÓN – Palma de Mallorca, Islas BalearesEstamos buscando un/a Administrador/a de Sistemas HPC (High Performance Computing) para Sistema de Observación y Predicción Costero de las Islas Baleares (SOCIB), ubicada en Palma de Mallorca (Parc Bit).¿QUÉ SE OFRECE?Contrato indefinido a jornada completa.Categoría laboral: III-B-3.Salario bruto...
-
AI Institute Director
hace 1 semana
España Somma A tiempo completoContext And MissionThe Barcelona Supercomputing Center (BSC-CNS) is establishing a new Institute of Artificial Intelligence (AI Institute) and is seeking an experienced and dynamic AI Institute Director. Reporting to the BSC Executive Board, the selected candidate will lead the Institute's operations and strategic vision, driving research at the intersection...
-
AI Institute Director
hace 7 días
España Barcelona Supercomputing Center - Centro Nacional de Supercomputación A tiempo completoContext And MissionThe Barcelona Supercomputing Center (BSC-CNS) is establishing a new Institute of Artificial Intelligence (AI Institute) and is seeking an experienced and dynamic AI Institute Director. Reporting to the BSC Executive Board, the selected candidate will lead the Institute’s operations and strategic vision, driving research at the...
-
HPC Workflows Engineer
hace 1 mes
España European Geosciences Union A tiempo completoJob Title: HPC Workflows Engineer (RE1)Type: Full timeLevel: ExperiencedPreferred Education: PhDPosted: 28 October 2024Deadline: The vacancy will remain open until a suitable candidate has been hired.About BSC:The Barcelona Supercomputing Center – Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing center in Spain. It houses...
-
HPC Workflows Engineer
hace 2 semanas
España Barcelona Supercomputing Center (BSC) A tiempo completoJob Reference 761_24_ES_EMW_RE1 Position HPC Workflows Engineer (RE1) Closing Date Thursday, 28 November, 2024 Reference: 761_24_ES_EMW_RE1 Job title: HPC Workflows Engineer (RE1) About BSC The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing center in Spain. It houses Mare Nostrum, one of...
-
HPC Parallel Performance Engineer
hace 1 mes
España Barcelona Supercomputing Center (BSC) A tiempo completoJob Reference 757_24_CS_BPPP_RE1 Position HPC Parallel Performance Engineer (RE1) Closing Date Friday, 15 November, 2024 Reference: 757_24_CS_BPPP_RE1 Job title: HPC Parallel Performance Engineer (RE1) About BSC The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing center in Spain. It houses...
-
Scientific DevOps Engineer
hace 4 semanas
España Center for Genomic Regulation A tiempo completoThe Institute The Centre for Genomic Regulation (CRG) is an international biomedical research institute of excellence, based in Barcelona, Spain, with more than 400 scientists from 44 countries. The CRG is composed of an interdisciplinary, motivated and creative scientific team supported by a flexible and efficient administration and high-end innovative...
-
Lead AI Engineer Senior Software Developer
hace 1 mes
España Iagservices A tiempo completoLead AI Engineer / Senior Software DeveloperWe are recruiting an ambitious Lead AI Engineer to join our IAG technology team. You'll be instrumental in shaping our tech infrastructure, defining the strategic direction of our platforms, and overseeing their execution. This hands-on role involves leading the design and implementation of AI systems with a keen...
-
Design Verification Engineer
hace 2 semanas
España buscojobs España A tiempo completoDesign Verification Engineer - AI Accelerator ChipsInterested in pushing the boundaries of AI technology as a Design Verification Engineer? Join a leading team in developing next-generation, high-performance, AI accelerator chips that drive innovations across industries like automotive, HPC, security, and more.This opportunity offers a unique chance for a...
-
AI Solutions Engineer
hace 1 mes
España Zurich 56 Company Ltd A tiempo completoAI Solutions Engineer Our opportunityWe are seeking a highly skilled AI Solutions Engineer to join our team. The ideal candidate will have extensive experience in machine learning (ML) and Generative AI, focusing on both the strategic and tactical development of AI solutions. You will architect and deploy advanced AI systems, particularly around Generative...
-
HPC Workflows Engineer
hace 1 mes
España Somma A tiempo completoReference: 761_24_ES_EMW_RE1Job title: HPC Workflows Engineer (RE1)About BSCThe Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing center in Spain. It houses MareNostrum, one of the most powerful supercomputers in Europe, was a founding and hosting member of the former European HPC infrastructure...
-
AI Solutions Engineer
hace 1 mes
España Zurich Australian Insurance Ltd. A tiempo completoWe are seeking a highly skilled AI Solutions Engineer to join our team. The ideal candidate will have extensive experience in machine learning (ML) and Generative AI, focusing on both the strategic and tactical development of AI solutions. You will architect and deploy advanced AI systems, particularly around Generative AI, leveraging your deep technical...
-
HPC Engineer for Earth Sciences applications
hace 1 mes
España Barcelona Supercomputing Center (BSC) A tiempo completoJob Reference 763_24_ES_HPC_RE2 Position HPC Engineer for Earth Sciences applications (RE2) Closing Date Thursday, 28 November, 2024 Reference: 763_24_ES_HPC_RE2 Job title: HPC Engineer for Earth Sciences applications (RE2) About BSC The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing...
-
Senior Machine Learning Engineer
hace 1 mes
España Nielseniq A tiempo completoNIQ is seeking a highly skilled and experienced Senior ML Engineer to join our dynamic team. As a Senior ML Engineer at NIQ, you will play a crucial role in developing and implementing advanced AI/GenAI models and algorithms to solve complex business problems. You will collaborate closely with cross-functional teams to design, build, and deploy scalable...
-
Junior Research Engineer
hace 1 mes
España Barcelona Supercomputing Center (BSC) A tiempo completoJob Reference 786_24_CS_AIR_RE1 Position Junior Research Engineer - Support on AI research (RE1) Closing Date Saturday, 16 November, 2024 Reference: 786_24_CS_AIR_RE1 Job title: Junior Research Engineer - Support on AI research (RE1) About BSC The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is the leading...
-
Research Engineer on AI libraries and tools
hace 1 mes
España Barcelona Supercomputing Center (BSC) A tiempo completoJob Reference 788_24_CS_CAOS_RE3 Position Research Engineer on AI libraries and tools (RE3) Closing Date Saturday, 30 November, 2024 About BSC The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing center in Spain. It houses MareNostrum, one of the most powerful...
-
Research Engineer on AI HW/SW
hace 1 mes
España Barcelona Supercomputing Center (BSC) A tiempo completoJob Reference 721_24_CS_CAOS_RE3 Position Research Engineer on AI HW/SW (RE3) Closing Date Thursday, 31 October, 2024 Reference: 721_24_CS_CAOS_RE3 Job title: Research Engineer on AI HW/SW (RE3) About BSC The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing center in Spain. It houses Mare...
-
AI Institute Coordinator
hace 1 mes
España Barcelona Supercomputing Center (BSC) A tiempo completoJob Reference 783_24_DIR_DIR_AIC Position AI Institute Coordinator - AI4S Closing Date Wednesday, 20 November, 2024 About BSC The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing center in Spain. It houses MareNostrum, one of the most powerful...
-
RTL HW for AI
hace 2 semanas
España Barcelona Supercomputing Center (BSC) A tiempo completoJob Reference 849_24_CS_HPDA_RE1 Position RTL HW for AI (RE1) Closing Date Sunday, 15 December, 2024 Reference: 849_24_CS_HPDA_RE1 Job title: RTL HW for AI (RE1) About BSC The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing center in Spain. It houses Mare Nostrum, one of the most powerful...