Empleos actuales relacionados con AI Agent Evaluation Analyst - Madrid, Madrid - Mindrift
-
Evaluation Scenario Writer
hace 1 semana
Madrid, Madrid, España Mindrift A tiempo completoThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we...
-
Evaluation Scenario Writer
hace 1 semana
Madrid, Madrid, España Mindrift A tiempo completoThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we...
-
AIML Evaluation
hace 1 semana
Madrid, Madrid, España Apple A tiempo completoAt Apple new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Apple is a place where extraordinary people come together to do their life's best work. Together, we build technologies and experiences people once couldn't have imagined - and now can't imagine living without The AI/ML team in Madrid, Spain,...
-
Madrid, Madrid, España Mindrift A tiempo completoThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of...
-
AI Agent Developer
hace 2 semanas
Madrid, Madrid, España Grupo TECDATA Engineering A tiempo completoPerfil:AI Agent DeveloperCategoría:ConsultorTipo:ExternoUbicación:Remoto (Madrid – GMT+1)Duración:01/01/2026 – 31/12/2026Descripción del puestoBuscamos un/a AI Agent Developer para participar en el diseño, desarrollo e integración de agentes de IA basados en LLMs, con el objetivo de mejorar procesos bancarios, automatizar workflows y optimizar la...
-
AI Agent Developer
hace 2 semanas
Madrid, Madrid, España Grupo TECDATA Engineering A tiempo completoDescripción del puestoBuscamos un/aAI Agent Developerpara colaborar en el desarrollo e integración deagentes de IA basados en LLMs, orientados a la mejora de procesos bancarios, automatización de workflows y optimización de la experiencia de clientes y empleados.Trabajarás bajo la supervisión de ingenieros senior, en un entorno técnico y...
-
AI Agent Developer
hace 2 semanas
Madrid, Madrid, España Grupo TECDATA Engineering A tiempo completodel rolEl perfil participará en la construcción e integración de agentes de IA basados en LLMs para mejorar procesos bancarios, automatizar flujos de trabajo y mejorar la experiencia de clientes y empleados, bajo la supervisión de ingenieros senior.Responsabilidades principalesDesarrollo y mantenimiento de agentes de IA.Integración de APIs de LLM...
-
Founding AI Engineer
hace 7 días
Madrid, Madrid, España Traza A tiempo completoAbout Us:Trillions of world's GDP still depends on manual, inefficient operations. Traza is here to transform it.We build 24/7 AI Operators for the critical industries that never sleep, rewiring how the real economy runs. Backed by top-tier investors from Silicon Valley, we're defining a new category and reshaping a major slice of global productivity.We...
-
Applied AI Engineer
hace 7 días
Madrid, Madrid, España Maisa A tiempo completoApplied AI EngineerLocation:Fully RemoteTeam:ResearchWelcome to Maisa - Making AI AccountableOur agentic process automation platform helps enterprises automate complex, decision-heavy processes that traditional automation can't handle and GenAI can't be trusted with.We enable organizations to scale operations, resist hallucinations, and bring end-to-end...
-
AI Engineer
hace 2 semanas
Madrid, Madrid, España Duckbill Technologies A tiempo completoDuckbill is revolutionizing the personal assistant market by harnessing the power of AI to offer seamless, personalized solutions for daily life management. This innovative service is dedicated to simplifying complex tasks, from planning trips to managing appointments, through intelligent, user-centric technology.The AI Engineer role at Duckbill is a...
AI Agent Evaluation Analyst
hace 2 semanas
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.
At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.
What we do
The Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe.
Who we're looking for:
We're looking for curious and intellectually proactive contributors, the kind of person who double-checks assumptions and plays devil's advocate.
Are you comfortable with ambiguity and complexity? Does an async, remote, flexible opportunity sound exciting? Would you like to learn how modern AI systems are tested and evaluated?
This is a flexible, project-based opportunity well-suited for:
- Analysts, researchers, or consultants with strong critical thinking skills.
- Students (senior undergrads / grad students) looking for an intellectually interesting gig.
- People open to a part-time and non-permanent opportunity.
About the project:
We're on the hunt for QAs for autonomous AI agents for a new project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout the project, you'll have to balance quality assurance, research, and logical problem-solving. This project opportunity is ideal for people who enjoy looking at systems holistically and thinking through scenarios, implications, and edge cases.
You do not need a coding background, but you must be curious, intellectually rigorous, and capable of evaluating the soundness and consistency of complex setups. If you've ever excelled in things like consulting, CHGK, Olympiads, case solving, or systems thinking — you might be a great fit.
What you'll be doing:
- Reviewing evaluation tasks and scenarios for logic, completeness, and realism.
- Identifying inconsistencies, missing assumptions, or unclear decision points.
- Helping define clear expected behaviors (gold standards) for AI agents.
- Annotating cause-effect relationships, reasoning paths, and plausible alternatives.
- Thinking through complex systems and policies as a human would to ensure agents are tested properly.
- Working closely with QA, writers, or developers to suggest refinements or edge case coverage.
How to get started:
Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.
Requirements
- Excellent analytical thinking: Can reason about complex systems, scenarios, and logical implications.
- Strong attention to detail: Can spot contradictions, ambiguities, and vague requirements.
- Familiarity with structured data formats: Can read, not necessarily write JSON/YAML.
- Ability to assess scenarios holistically: What's missing, what's unrealistic, what might break?
- Good communication and clear writing (in English) to document your findings.
We also value applicants who have:
- Experience with policy evaluation, logic puzzles, case studies, or structured scenario design.
- Background in consulting, academia, olympiads (e.g. logic/math/informatics), or research.
- Exposure to LLMs, prompt engineering, or AI-generated content.
- Familiarity with QA or test-case thinking (edge cases, failure modes, "what could go wrong").
- Some understanding of how scoring or evaluation works in agent testing (precision, coverage, etc.).
Benefits
- Get paid for your expertise, with rates that can go up to $32/hour depending on your skills, experience, and project needs.
- Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.
- Participate in an advanced AI project and gain valuable experience to enhance your portfolio.
- Influence how future AI models understand and communicate in your field of expertise.