AI Evaluation Engineer

hace 6 días

Hybrid Barcelona BarcelonaCatalunyaEspanya Spain Openchip And Software Technologies SL A tiempo completo

The Role:

We are seeking an exceptional AI Evaluation Engineer to design, implement, and scale frameworks for assessing the performance, reliability, and trustworthiness of advanced AI systems. This individual will be responsible for developing methodologies and tools to measure model quality across diverse dimensions, such as accuracy, robustness, reasoning, safety, and efficiency.

Key Responsibilities:

Design and Develop Evaluation Frameworks: Create scalable, reproducible evaluation pipelines for large-scale AI systems, including LLMs and multi-agent architectures, covering both automated and human-in-the-loop testing strategies.
Metric Innovation: Define and implement novel evaluation metrics that capture model capabilities beyond traditional benchmarks.
Benchmarking & Performance Analysis: Conduct benchmarking of AI models across domains, tasks modalities, analyzing their skills and behavior under different setups.
Safety, Reliability & Alignment Testing: Develop tools and experiments to probe model safety, robustness, interpretability, and bias.
Cross-functional Collaboration: Work closely with model finetuning and optimization teams to evaluate end-to-end system effectiveness, efficiency. Identify trade-offs between model performance, latency, and energy footprint.
Continuous Improvement & Reporting: Monitor model performance over time, automate regression detection, and contribute to the continuous evaluation infrastructure that supports Openchip's AI research and product roadmap.

Qualifications:

MSc or PhD in Computer Science, Artificial Intelligence, Machine Learning, Statistics, or a related field. A publication record in ML evaluation, benchmarking, or interpretability is a plus.
3+ years of experience developing, evaluating, or optimizing AI systems.
Strong programming skills in Python, with experience using PyTorch, TensorFlow, or JAX.
Experience in designing evaluation protocols for LLMs, multi-agent systems, or reinforcement learning environments.
Deep understanding of ML metrics, evaluation methodologies, and statistical analysis.
Experience with data quality, annotation workflows, and benchmark dataset creation is a plus.
Fluent in English; proficiency in additional European languages (German, Dutch, Spanish, French, or Italian) is a plus.

Soft Skills:

Analytical Rigor: An evidence-driven mindset that enjoys designing robust experiments to quantify and uncover complex AI behaviors, translating empirical insights into new research directions.
Collaboration & Communication: Excellent communication and collaboration skills in a multidisciplinary environment.
Integrity & Responsibility: Committed to building AI systems that are not only powerful but also safe, reliable, and aligned with human values.

What We Offer?

The opportunity to build a cloud AI deployment platform that will power next generation AI systems.
A collaborative, innovation-driven environment with significant autonomy and ownership.
Hybrid work model with flexible scheduling.
A chance to join one of Europe's most ambitious companies at the intersection of AI and silicon engineering.
Position based in Barcelona.

We're looking for exceptional engineers ready to shape the future of AI infrastructure. If building scalable, cloud-native AI deployment platforms excites you, we'd love to meet you.

At Openchip & Software Technologies S.L., we believe a diverse and inclusive team is the key to groundbreaking ideas. We foster a work environment where everyone feels valued, respected, and empowered to reach their full potential—regardless of race, gender, ethnicity, sexual orientation, or gender identity.

AI Evaluation Engineer

hace 2 semanas

Barcelona, España Openchip And Software Technologies SL A tiempo completo

The Role: We are seeking an exceptional AI Evaluation Engineer to design, implement, and scale frameworks for assessing the performance, reliability, and trustworthiness of advanced AI systems. This individual will be responsible for developing methodologies and tools to measure model quality across diverse dimensions, such as accuracy, robustness,...
Generative AI Evaluation Scientist

hace 1 semana

Barcelona, España Multiverse Computing A tiempo completo

A fast-growing deep-tech company in Barcelona is searching for an AI Evaluation Data Scientist with expertise in Generative AI.Desplácese hacia abajo para encontrar una descripción detallada de este trabajo y lo que se espera de los candidatos. Envíe su solicitud haciendo clic en el botón "Solicitar".This role involves leading evaluation strategies,...
Sr AI Engineer

hace 2 días

Barcelona, España Murphy AI A tiempo completo

1 week ago Be among the first 25 applicants About Murphy AI Murphy AI is revolutionizing debt collection through conversational artificial intelligence. We're building an advanced debt recovery platform where AI agents conduct natural, empathetic conversations across multiple channels—with a special focus on voice interactions at scale. Our AI agents...
AI Agent Evaluation Analyst

hace 1 semana

Barcelona, España Mindrift A tiempo completo

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency. At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of...
AI Evaluation Data Scientist

hace 2 semanas

barcelona, España Multiverse Computing LLC A tiempo completo

We are looking to fill this role immediately and are reviewing applications daily. Expect a fast, transparent process with quick feedback. Why join us? We are a European deep-tech leader in quantum and AI, backed by major global strategic investors and strong EU support. Our groundbreaking technology is already transforming how AI is deployed worldwide —...
AI Evaluation Data Scientist

hace 1 semana

Barcelona, España Multiverse Computing A tiempo completo

Come and join our multicultural team!Si sus habilidades, experiencia y cualificaciones coinciden con las de esta descripción del puesto, no demore su solicitud.5 locations27 languagesWe are looking to fill this role immediately and are reviewing applications daily. Expect a fast, transparent process with quick feedback.Why join us?We are a European...
Postdoctoral Researcher: AI Models Evaluation

hace 2 semanas

barcelona, España Barcelona Supercomputing Center A tiempo completo

A leading supercomputing research center in Barcelona is seeking a Postdoctoral Researcher for AI Models Evaluation. This full-time role includes developing evaluation frameworks for AI models and requires a PhD in relevant fields along with significant experience in software metadata evaluation and database management. The position offers flexible working...
AI Evaluation Data Scientist

hace 1 semana

Barcelona, España Multiverse Computing LLC A tiempo completo

We are looking to fill this roleLa siguiente información tiene como objetivo proporcionar a los posibles candidatos una mejor comprensión de los requisitos para este puesto.immediatelyand are reviewing applications daily. Expect a fast, transparent process with quick feedback.Why join us?We are a European deep-tech leader in quantum and AI, backed by major...
AI Engineer

hace 9 horas

Barcelona, Barcelona, España Quadrivia AI A tiempo completo

The RoleOwn and evolve the core "brain" service that powers Qu. Design, build, and operate multi-agent LLM systems that communicate in real time over text and voice. Ship fast Python services with FastAPI, keep latency low, quality high, and evaluation continuous.What You'll DoOwn Qu's brain service end to end: architecture, SLAs, latency budgets, error...
Founding AI Engineer

hace 1 semana

Barcelona, España Simon AI A tiempo completo

About us Simon is redefining how finance teams work. We're building the Finance AI Teammate, a category-defining product that turns ERPs like NetSuite into a simple, intelligent, and proactive system of intelligence. Backed by global NetSuite partners, we're scaling fast and working with some of the most ambitious mid-sized companies in Europe and beyond. We...

América

Europa

Asia / Oceanía

África

AI Evaluation Engineer