Agent Evaluation Analyst

hace 4 días


españa Toloka A tiempo completo

Join to apply for the Agent Evaluation Analyst (Freelance) role at Toloka At Toloka AI we create data that powers leading GenAI models and innovations. We work with frontier labs, big tech, renowned AI startups, enterprises and non-profit research organizations worldwide. We use a combination of Experts + Crowd + Tech Platform to teach AI models to reason and evaluate their efficacy and safety. We have experts in more than 50 different domains—from doctors and lawyers to physicists and engineers—and boast one of the most diverse global crowds, representing over 100 countries and speaking 40+ languages . We are a well‑funded startup with an enviable portfolio of clients including Anthropic , Amazon , Microsoft , poolside , Recraft , and Shopify . Recently, we secured strategic investment led by Bezos Expeditions with participation from Mikhail Parakhin , CTO of Shopify and board advisor to leading GenAI companies, who now serves as our Chairman of the Board. Our remote‑first team is globally distributed around the world: USA , UK , the Netherlands , Israel , Czech Republic , Serbia , and more. We are headquartered in Amsterdam . About the role We are looking for a Freelance Agent Evaluation Analyst to take ownership of quality, structure, and insight across the project. This role goes far beyond task‑checking— it’s about critical thinking, systems‑level analysis, and ensuring clarity, reliability, and consistency at scale. You’ll work as both a hands‑on evaluator and an analyst, collaborating with domain experts, delivery managers, and engineers. Beyond reviewing outputs, you’ll be expected to understand the “why” behind the work, identify logical gaps or inconsistencies, and propose meaningful improvements. This is a flexible, impact‑driven role where you’ll have space to grow, contribute ideas, and help shape how evaluation and quality are scaled across the project. This role is especially well‑suited for: Analysts, researchers, or consultants with strong structuring and reasoning skills Junior product managers or strategists curious about AI and evaluation work Smart problem‑solvers (students or early‑career professionals) who enjoy digging into logic, systems, and edge cases You do not need a coding background. What matters most is curiosity, intellectual rigor, and the ability to evaluate complex setups with precision. What you’ll be doing Fully own the QA pipeline for agent evaluation tasks; Review and validate tasks and golden paths created by scenario writers and experts; Spot logical inconsistencies, vague requirements, hidden risks, and unrealistic assumptions; Provide structured feedback and ensure quality alignment across contributors; Train, onboard, and mentor new QA team members; Collaborate with domain experts, delivery managers, and engineers to improve test clarity and coverage; Maintain and improve QA checklists, SOPs, and review guidelines; Contribute to test planning, prioritization, and quality benchmarks; Take initiative to suggest new approaches, tools, and processes that help scale validation and analysis. What you should know / be able to do Strong analytical and critical thinking skills ; Attention to detail and reliability – your work can be trusted without double‑checking; Experience in manual QA , scenario validation , or similar analytical work ; Comfortable working with structured formats (JSON /YAML ); Clear written communication and documentation skills ; Ability to give constructive feedback and coach others; Capable of working with a wide range of stakeholders: from engineers to directors/VPs. Nice to have Background in scenario‑based testing, test design, or annotation workflows; Experience with AI/LLM evaluation, prompt validation, or agent behavior testing; Some technical independence (e.g., Python skills); Familiarity with MCP / tool‑based task execution; Experience working in cross‑functional teams across product, delivery, and engineering. Who you are Detail‑obsessed but also able to see the bigger picture; Proactive , independent , and take true ownership of your work; Strong communicator who can turn complex findings into actionable insights; Flexible and motivated to contribute across a variety of tasks and projects; Believe quality is not just checking work, but making the whole product better. What we can offer Freelance contract (B2B) ; Flexible



  • españa Toloka A tiempo completo

    A leading AI company is seeking a Freelance Agent Evaluation Analyst to manage quality assurance across projects. This role requires strong analytical and critical thinking skills to evaluate complex setups. The ideal candidate should be detail-oriented and possess excellent communication abilities. Responsibilities include owning the QA pipeline, reviewing...


  • España Angove Partners A tiempo completo

    About the job: Angove Partners is working with a market-leading underwriter focused on Transactional Risk Insurance products. They seek to provide best-in-class service for clients, lawyers, and brokers on deal time and with the commercial priorities of our partners in mind. They are hiring for an Analyst to join their Barcelona or Madrid team. It is an...

  • AI/ML Engineer

    hace 6 días


    españa Cracken A tiempo completo

    About Company Cracken is a fast-growing Silicon Valley-based startup built by elite nation-state and commercial operators who defended critical cyber infrastructure during the war in Ukraine, researched AI and cybersecurity at MIT and Kyiv Polytechnic, and led teams at Apple, Google, Palo Alto Networks, HackerOne, DIU, Comcast, HP, and more. We tame Cracken,...


  • españa Cracken A tiempo completo

    About Company Cracken is a fast-growing Silicon Valley-based startup built by elite nation-state and commercial operators who defended critical cyber infrastructure during the war in Ukraine, researched AI and cybersecurity at MIT and Kyiv Polytechnic, and led teams at Apple, Google, Palo Alto Networks, HackerOne, DIU, Comcast, HP, and more. We tame Cracken,...

  • Fullstack Engineer

    hace 2 semanas


    españa Cracken A tiempo completo

    About Company Cracken is a fast-growing Silicon Valley-based startup built by elite nation-state and commercial operators who defended critical cyber infrastructure during the war in Ukraine, researched AI and cybersecurity at MIT and Kyiv Polytechnic, and led teams at Apple, Google, Palo Alto Networks, HackerOne, DIU, Comcast, HP, and more. We tame Cracken,...


  • España Clarity A tiempo completo

    Clarity AI is a global tech company founded in 2017 with a unique mission**:bringing societal impact to markets.** We leverage AI and machine learning technologies to provide **top international** **investors, governments, companies, and consumers** with the right data, methodologies, and tools to make more informed decisions. We are now a team of more than...


  • españa Twtspain A tiempo completo

    Nature of the tasks Architecture and design of systems, under the domains of Cloud architecture, Application architecture, Infrastructure architecture or Data architecture. Review of the architecture of existing systems. Design and development of component architecture and building blocks. Analysis of requirements and modelling of Information Systems....


  • españa HY SOLAR A tiempo completo

    Global Recruitor from HY Solar Email: Job Responsibilities Develop tailored PV system solutions (including component selection, energy storage configuration, inverter matching, and smart monitoring) based on the geographical climate, grid standards, policy regulations, and customer requirements of different countries/regions. Provide comprehensive solutions...


  • españa Cracken A tiempo completo

    About Company Cracken is a fast-growing Silicon Valley-based startup built by elite nation‑state and commercial operators who defended critical cyber infrastructure during the war in Ukraine, researched AI and cybersecurity at MIT and Kyiv Polytechnic, and led teams at Apple, Google, Palo Alto Networks, HackerOne, DIU, Comcast, HP, and more. We tame...


  • españa Mindrift A tiempo completo

    This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency. At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI. What...