AgentOps Engineer

hace 4 semanas

Madrid, España Kyndryl A tiempo completo

This job is with Kyndryl, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.Who We Are At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward - always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities.The Role We're looking for exceptional talent to join our AI Agentic Innovation Hub at KyndrylThe AI Agentic Innovation Hub stands as Kyndryl's center of excellence for advanced and agentic artificial intelligence. Our mission is to lead the design and deployment of transformative AI solutions that bridge frontier research with real-world impact - scalable, secure, and driven by measurable value. Built upon a team of exceptional talent andcutting-edgetechnology, the Hub embodies a spirit of bold innovation and disciplined execution - an elite unit within one of the world's leading technology companies. With national reach and global ambition, we partner with major organizations to tackle their most complex challenges, pioneering the next generation of intelligent, autonomous, and trusted systems that redefine what AI can achieve.Job DescriptionAs a Senior Observability Engineer at Kyndryl's AI Innovation Hub,you'llbe at the core of operational excellence for next-generation intelligent and agentic systems. Your mission will be to design, implement, andmaintainadvanced observability and monitoring capabilities that ensure thereliability, traceability, and performanceof AI agents and models in production. You'll help build the observability architecture for agentic intelligence - integrating tracing, logging, monitoring, and governance tools that provide a deep understanding of how agentsperceive, reason, and actin complex environments. Your work will enable early detection of anomalies, data drift, performance degradation, bias, or undesired agent behavior, ensuring compliance with theEU AI Actand Responsible AI principles. Ifyou'repassionate about bridging AI systems with operational intelligence, and about creating frameworks that make AI transparent, accountable, and trustworthy, this role offers a unique opportunity to shape the future of intelligent observability.Your Mission Design and implement theobservability architecturefor AI and Agentic systems, enabling end-to-end visibility across models, agents, and data pipelines.Developinstrumentation frameworksto collect and analyze technical, behavioral, and cognitive metrics for deployed AI systems.Integrate and configuremonitoring, tracing, and logging tools(Prometheus, Grafana,OpenTelemetry, ELK Stack, Datadog, etc.) to ensure full operational insight.Builddashboards and alerting mechanismsto detect data drift, performance issues, hallucinations, or reasoning inconsistencies in LLMs and agents.Collaborate withMLOps, Data, and Architecture teams toestablishmodel lineage, drift detection, and governance pipelines.Design andmaintaincustom metricsfor model and agent reliability - precision, latency, cost, reasoning depth, autonomy, and consistency.Contribute to theResponsible AI framework, ensuring transparency, fairness, and auditability in AI decision-making.Continuously research and experiment with new observability tools and practices (AgentOps,LLMOps, RAG Observability).Who You Are Essential Qualifications 4+ years of professional experience, including at least 2 years inAI,MLOps, or distributed systemsprojects.Proven experience designing and implementingmonitoring, logging, and performance metricsfor production systems.Hands-onexpertisewithobservability toolssuch as Prometheus, Grafana,OpenTelemetry, ELK Stack, Loki, Jaeger, or Datadog.Experience instrumentingAI and ML pipelines, tracking inference latency, throughput, and cost metrics.Familiarity withMLOpsandLLMOpsframeworks, including model traceability, drift detection, and prompt or reasoning tracing.Knowledge ofagentic frameworks(LangGraph,AutoGen,CrewAI,OpenDevin, Google ADK) and their monitoring needs.Experience designingcustom metricsfor precision, reliability, error rate, and cognitive consistency.Strong understanding ofcloud-native architectures, containers, andIaCtools (Kubernetes, Docker, Helm, Terraform).Awareness ofAI compliance and governancerequirements (EU AI Act, Responsible AI, decision traceability).Education & Certifications Bachelor's degree inComputer Engineering,Software Engineering,Data Science, or related field.Postgraduate or specialized training inMLOps,DevOps,Observability, orArtificial Intelligenceis highly valued.Certifications inCloud Architecture,Monitoring, orAI Governanceare a plus.Continuous learning mindset and commitment to staying current withemerging AI observability frameworks.Preferred Skills Experience withmodel observability and data lineagesystems.Understanding ofcognitive observability, including reasoning-chain or decision-path tracing in agents.Familiarity withevent-driven architecturesand telemetry for real-time AI services.Knowledge ofFinOpsmetrics and cost optimization for AI workloads.Experience developing customdashboards or visualization pluginsfor monitoring complex systems.Comfort working inhybrid or multi-cloud environments(Azure, AWS, GCP).Strong interest inAI reliability engineeringand the convergence of AI and DevOps practices.Soft Skills Analytical and systemic thinker, understanding the interplay between data, systems, and agent behavior.Clear communicator, able to convey complex insights and performance findings to both technical and business audiences.Quality- and reliability-driven, with a preventive mindset focused on operational resilience.Collaborative and cross-functional, working seamlessly with AI, data, and compliance teams.Curious and proactive, exploring emerging technologies and methods in AI observability andAgentOps.Ethical and responsible, aware of the implications and accountability of automated decisions inproductionAI. #AgenticAIBeing You Diversity is a whole lot more than what we look like or where we come from, it's how we think and who we are. We welcome people of all cultures, backgrounds, and experiences. But we're not doing it single-handily: Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice. This dedication to welcoming everyone into our company means that Kyndryl gives you - and everyone next to you - the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture. That's the Kyndryl Way.What You Can Expect With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value. Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter - wherever you are in your life journey. Our employeelearning programsgive you access to the best learning in the industry to receive certifications,including Microsoft,Google, Amazon, Skillsoft, and many more. Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations.At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed.Get Referred If you know someone that works at Kyndryl, when asked 'How Did You Hear About Us' during the application process, select 'Employee Referral' and enter your contact's Kyndryl email address.

América

Europa

Asia / Oceanía

África

AgentOps Engineer