AI Evaluation Data Scientist

hace 1 semana

Barcelona, España Multiverse Computing A tiempo completo

AI Evaluation Data Scientist (Fixed-term contract)2 days ago Be among the first 25 applicantsWe are looking to fill this role immediately. We are reviewing applications daily. Expect a fast, transparent process with quick feedback.Why join us?We are a European deep‑tech leader in quantum and AI, backed by major global strategic investors and strong EU support. Our groundbreaking technology is already transforming how AI is deployed worldwide — compressing large language models by up to 95% without losing accuracy and cutting inference costs by 50–80%. Joining us means working on cutting‑edge solutions that make AI faster, greener, and more accessible — and being part of a company often described as a “quantum‑AI unicorn in the making.”We offerCompetitive annual salaryTwo unique bonuses: signing bonus at incorporation and retention bonus at contract completionRelocation package (if applicable)Fixed‑term contract ending in June 2026Hybrid role and flexible working hoursBe part of a fast‑scaling Series B company at the forefront of deep techEqual pay guaranteedInternational exposure in a multicultural, cutting‑edge environmentJob OverviewWe are seeking a skilled and experienced AI Evaluation Data Scientist with a strong technical background in Generative AI to join our team. In this role you will lead the design and implementation of evaluation frameworks to assess the performance of Generative AI systems before deployment, working closely with cross‑functional teams to turn outcomes into actionable insights.ResponsibilitiesDesign and lead the evaluation strategy for our Agentic AI and RAG systems, turning customer workflows and business needs into measurable metrics and clear success criteria.Contribute to the end‑to‑end design of Agentic AI and RAG systems, injecting a data‑and‑evaluation perspective into retrieval strategies, orchestration policies, tool usage, and memory to solve complex, real‑world problems across industries.Develop task‑based, multi‑step evaluations that reflect how the different components of our systems perform in real‑world scenarios across cloud and edge deployments.Develop and refine rigorous evaluation frameworks that reflect real‑world performance, going beyond model benchmarks to assess task success, reasoning capabilities, factual consistency, reliability, and user success metrics across diverse problem domains.Build and maintain a reproducible evaluation pipeline, including datasets, scenarios, configs, test suites, versioned assets, and automated runs to track regressions and improvements over time.Curate and generate high‑quality datasets for evaluation, including synthetic and adversarial data, to strengthen coverage and robustness.Implement and calibrate LLM‑as‑a‑judge evaluations, aligning automated scoring with human feedback and ensuring fairness, robustness, and representativeness.Perform deep error analyses and ablations to uncover failure patterns, maintain a taxonomy of failure modes (reasoning, grounding, hallucinations, tool failures), and provide actionable insights to engineers to improve model and system performance.Partner with ML specialists to create a data flywheel, where evaluation continuously informs new dataset creation, improvements on prompts, tool usage, model training, and system refinements, quantifying improvements over time.Define and monitor operational metrics (latency, cost, reliability) to ensure evaluations align with production and customer expectations.Maintain high engineering standards, including clear documentation, reproducible experiments, robust version control, and well‑structured ML pipelines.Contribute to team learning and mentorship, guiding junior engineers and sharing expertise in LLM development, evaluation, and deployment best practices.Participate in code reviews, offering thoughtful, constructive feedback to maintain code quality, readability, and consistency.Required Minimum QualificationsMaster's or Ph.D. in Computer Science, Machine Learning, Data Science, Physics, Engineering, or related technical fields, with relevant industry experience.Solid hands‑on experience (3+ years for mid‑level, 5+ years for senior) working as a Data Scientist, ML Engineer, or Research Scientist in applied AI/ML projects deployed in production environments.Strong background in evaluation of machine learning systems, ideally with experience in LLMs, RAG pipelines, or multi‑agent systems.Proven ability to design and implement evaluation methodologies that go beyond static benchmarks, capturing real‑world task success, reasoning, and robustness.Hands‑on experience with dataset creation and curation (including synthetic data generation) for training and evaluation.Proven experience with agent‑based architectures (task decomposition, tool use, reasoning workflows), RAG architectures (retrievers, vector databases, rerankers), and orchestration frameworks (LangGraph, LlamaIndex).Strong problem‑solving skills, with the ability to navigate ambiguity and design practical solutions to open‑ended user or business needs.Strong software engineering skills, with proficiency in Python, Docker, Git, and experience building robust, modular, and scalable ML codebases.Familiarity with common ML and data libraries and frameworks (e.g., PyTorch, HuggingFace, LangGraph, LlamaIndex, Pandas, etc.).Experience with cloud platforms (ideally AWS).Excellent communication skills, with the ability to work collaboratively in a team environment, document and explain design decisions, experimental results, and communicate complex ideas effectively.Fluent in English.Preferred QualificationsPh.D. in Computer Science, Machine Learning, Data Science, Physics, Engineering, or related technical fields, with relevant industry experience.Experience designing and running evaluation frameworks for agentic AI systems, RAG pipelines, or multi‑agent orchestration.Demonstrated experience with synthetic data generation (e.g., using LLMs to bootstrap datasets), data augmentation, and adversarial testing.Strong background in error analysis of LLMs (hallucinations, grounding issues, tool failures, reasoning gaps) and in translating insights into concrete engineering improvements.Track record of open‑source contributions, publications, or public talks in the area of LLM evaluation, benchmarking, or applied AI systems.Fluent in Spanish.About Multiverse ComputingFounded in 2019, we are a well‑funded, fast‑growing deep‑tech company with a team of 180+ employees worldwide. Recognized by CB Insights (2023 & 2025) as one of the Top 100 most promising AI companies globally , we are also the largest quantum software company in the EU.Our flagship products address critical industry needs:CompactifAI → a groundbreaking compression tool for foundational AI models, reducing their size by up to 95% while maintaining accuracy, enabling portability across devices from cloud to mobile and beyond.Singularity → a quantum and quantum‑inspired optimization platform used by blue‑chip companies in finance, energy, and manufacturing to solve complex challenges with immediate performance gains.You’ll be working alongside world‑leading experts in quantum computing and AI, developing solutions that deliver real‑world impact for global clients. We are committed to an inclusive, ethics‑driven culture that values sustainability, diversity, and collaboration — a place where passionate people can grow and thrive. Come and join usEqual Opportunity Employer – Multiverse Computing welcomes people from all different backgrounds, including age, citizenship, ethnic and racial origins, gender identities, individuals with disabilities, marital status, religions and ideologies, and sexual orientations.Join our multicultural team 5 locations, +27 languages.#J-18808-Ljbffr

Generative AI Evaluation Scientist

hace 2 semanas

barcelona, España MULTIVERSE COMPUTING A tiempo completo

A fast-growing deep-tech company in Barcelona is searching for an AI Evaluation Data Scientist with expertise in Generative AI. This role involves leading evaluation strategies, partnering with cross-functional teams, and enhancing AI systems' performance. Ideal candidates possess 3+ years of industry experience and a master's or Ph.D. in a relevant field....
Generative AI Evaluation Scientist

hace 4 días

Barcelona, España Multiverse Computing A tiempo completo

A fast-growing deep-tech company in Barcelona is searching for an AI Evaluation Data Scientist with expertise in Generative AI.Desplácese hacia abajo para encontrar una descripción detallada de este trabajo y lo que se espera de los candidatos. Envíe su solicitud haciendo clic en el botón "Solicitar".This role involves leading evaluation strategies,...
Hybrid AI Evaluation Scientist

hace 2 semanas

barcelona, España Hyperproof A tiempo completo

A leading deep-tech company in Barcelona is looking for an experienced AI Evaluation Data Scientist to design and implement evaluation frameworks for Generative AI systems. The role involves defining metrics for success, developing evaluations based on real-world scenarios, and collaborating with cross-functional teams. Candidates should have a Master's or...
AI Evaluation Data Scientist

hace 2 semanas

barcelona, España MULTIVERSE COMPUTING A tiempo completo

Come and join our multicultural team! 5 locations 27 languages We are looking to fill this role immediately and are reviewing applications daily. Expect a fast, transparent process with quick feedback. Why join us? We are a European deep-tech leader in quantum and AI, backed by major global strategic investors and strong EU support. Our groundbreaking...
AI Evaluation Data Scientist

hace 5 días

barcelona, España Multiverse Computing LLC A tiempo completo

We are looking to fill this role immediately and are reviewing applications daily. Expect a fast, transparent process with quick feedback. Why join us? We are a European deep-tech leader in quantum and AI, backed by major global strategic investors and strong EU support. Our groundbreaking technology is already transforming how AI is deployed worldwide —...
AI Evaluation Data Scientist

hace 2 semanas

Barcelona, España MULTIVERSE COMPUTING A tiempo completo

Come and join our multicultural team!5 locations+27 languages We are looking to fill this role immediately and are reviewing applications daily. Expect a fast, transparent process with quick feedback. Why join us? We are a European deep-tech leader in quantum and AI, backed by major global strategic investors and strong EU support. Our groundbreaking...
AI Evaluation Data Scientist

hace 2 semanas

Barcelona, España MULTIVERSE COMPUTING A tiempo completo

Come and join our multicultural team!5 locations27 languagesWe are looking to fill this role immediately and are reviewing applications daily. Expect a fast, transparent process with quick feedback.Why join us?We are a European deep-tech leader in quantum and AI, backed by major global strategic investors and strong EU support. Our groundbreaking technology...
AI Evaluation Data Scientist

hace 4 días

Barcelona, España Multiverse Computing A tiempo completo

Come and join our multicultural team!Si sus habilidades, experiencia y cualificaciones coinciden con las de esta descripción del puesto, no demore su solicitud.5 locations27 languagesWe are looking to fill this role immediately and are reviewing applications daily. Expect a fast, transparent process with quick feedback.Why join us?We are a European...
AI Evaluation Scientist — Hybrid, Impactful GenAI

hace 2 semanas

barcelona, España Multiverse Computing A tiempo completo

A leading deep-tech company in Barcelona seeks an experienced AI Evaluation Data Scientist to develop frameworks for assessing Generative AI systems' performance. Candidates should have a Master's or Ph.D. in a related field, at least 3 years of relevant experience, and strong skills in evaluating machine learning systems. The position offers a competitive...
AI Evaluation Data Scientist

hace 4 días

Barcelona, España Multiverse Computing LLC A tiempo completo

We are looking to fill this roleLa siguiente información tiene como objetivo proporcionar a los posibles candidatos una mejor comprensión de los requisitos para este puesto.immediatelyand are reviewing applications daily. Expect a fast, transparent process with quick feedback.Why join us?We are a European deep-tech leader in quantum and AI, backed by major...

América

Europa

Asia / Oceanía

África

AI Evaluation Data Scientist