Site Reliability Engineer

hace 7 días


Aeropuerto de HuescaPirineos, España Avaya A tiempo completo

About Avaya Avaya is an enterprise software leader that helps the world's largest organizations and government agencies forge unbreakable connections. The Avaya Infinity platform unifies fragmented customer experiences, connecting the channels, insights, technologies, and workflows that together create enduring customer and employee relationships. We believe success is built through strong connections – with each other, with our work, and with our mission. At Avaya, you'll find a community that values your contributions and supports your growth every step of the way. Learn more at Description Role Overview We are seeking a Site Reliability Engineer (SRE) who will drive stability, reliability, and performance across our Azure-based platforms. This role blends operational excellence, proactive incident management, and strong collaboration with DevOps, Cloud, and Security teams. The ideal candidate will have hands-on experience with Azure, IaC (Terraform/Ansible), CI/CD (Jenkins/GitHub Actions), and monitoring systems, while also contributing to governance, cost optimization, and automation strategies that reduce toil and prevent issues before they occur. This position includes 24x7 support coverage (rotational) and requires strong ownership in managing major incidents, RCA processes, and continuous service improvements. Key Responsibilities Reliability & Incident Management Serve as a key member of the 24x7 on-call rotation, responding to and managing incidents across production and pre-production environments. Lead incident bridges, coordinate root cause analysis (RCA), and ensure post-incident reviews drive systemic improvements. Maintain clear communication with cross-functional teams and leadership during major incidents. Monitoring, Alerts & Prevention Build, tune, and maintain observability dashboards (Azure Monitor, Prometheus, Grafana, Datadog, Log Analytics). Define SLOs, SLIs, and error budgets to proactively identify and mitigate risks before customer impact. Continuously enhance alert quality, reduce false positives, and automate runbooks for faster recovery. Analyze trends to prevent recurring issues and support teams in resilience engineering. Governance & Cost Management Support cloud governance frameworks—ensuring resource tagging, naming conventions, policy compliance, and operational guardrails. Work with FinOps and DevOps teams to track, optimize, and report cost efficiency across Azure subscriptions. Participate in change, release, and CAB reviews to ensure reliability, compliance, and readiness. Infrastructure Awareness & Automation Understand IaC designs (Terraform, Ansible) and deployment workflows, leveraging automation for operational efficiency. Collaborate with DevOps and platform teams to reduce manual effort and "toil" in infrastructure operations. Implement self-healing mechanisms and automate recurring operational tasks where feasible. Ensure consistency and compliance through integration with CI/CD and policy-as-code systems. Security & Compliance Embed DevSecOps practices in daily operations—monitor vulnerabilities, patch non-compliant resources, and validate certificate rotations. Support FIPS, FedRAMP, PCI, and CIS control implementations in cloud and containerized environments. Collaboration & Agile Practices Partner with engineering, QA, and product teams to align reliability goals with delivery outcomes. Participate in agile ceremonies and advocate for SRE principles—"measure everything, automate where possible, and reduce toil." Document runbooks, playbooks, and operational processes to improve team efficiency and knowledge sharing. Requirements Required Skills & Experience 5+ years in Site Reliability, DevOps, or Cloud Operations roles. Proven expertise in Azure cloud operations and distributed system reliability. Strong understanding of Terraform, Ansible, and CI/CD pipelines (Jenkins, GitHub Actions). Experience with observability tools (Azure Monitor, Grafana, Prometheus, Datadog, or similar). Solid grasp of incident management frameworks (P1–P3 handling, RCA, PIRs, on-call rotations). Familiarity with governance, cost management, and security best practices in multi-cloud environments. Knowledge of containerized deployments (AKS/Kubernetes) and networking fundamentals. Excellent analytical, troubleshooting, and communication skills. Desired Behaviours Proactive Prevention: Identifies risks before they escalate into incidents. Accountability: Owns service reliability and communicates with clarity. Collaboration: Works seamlessly with platform, DevOps, and product teams. Efficiency: Focuses on automation to reduce manual effort and improve MTTR. Continuous Improvement: Learns from failures, iterates processes, and improves documentation. Security & Governance Mindset: Balances agility with control and compliance. Experience 3 years experience at the Engineer Two level or 5 – 8 years total experience Education Bachelor degree or equivalent experience Master degree or equivalent experience Footer Applicants must be currently authorized to work in the United States without the need for visa sponsorship now or in the future. Avaya is an Equal Opportunity employer and a U.S. Federal Contractor. Our commitment to equality is a core value of Avaya. All qualified applicants and employees receive equal treatment without consideration for race, religion, sex, age, sexual orientation, gender identity, national origin, disability, status as a protected veteran or any other protected characteristic. In general, positions at Avaya require the ability to communicate and use office technology effectively. Physical requirements may vary by assigned work location. This job brief/description is subject to change. Nothing in this job description restricts Avaya right to alter the duties and responsibilities of this position at any time for any reason.


  • Site Reliability Engineer

    hace 2 semanas


    Palma de Mallorca, España WebBeds A tiempo completo

    **This role can be based in either our Palma, Spain offices or Iasi, Romania** * Who are WebBeds?* WebBeds is the fastest growing and most significant accommodation supplier to the travel industry. We are a global company offering ground services (hotels, transfers, tours, activities) to travel professionals. Our products help our partners and customers to...


  • santiago de compostela, España JR Spain A tiempo completo

    Site Reliability Engineer, Santiago de Compostela En TuLotero estamos buscando un SRE para unirse al equipo. Buscamos una persona resolutiva y capaz de resolver problemas de forma autónoma, con experiencia gestionando sistemas en la nube (actualmente en AWS y DigitalOcean ), que utilice herramientas de automatización y despliegue de infraestructura...


  • Madrid (Vía de los Poblados), España ING A tiempo completo

    At ING we are looking for Site Reliability Engineer (SRE)Your role and work environment:We are looking for a talented Site Reliability Engineer to join our SRE Expert Unit.Our mission is to ensure the reliability and scalability of ING's platforms, delivering the best customer experience.We work closely with product teams to help them achieve their...


  • Santiago de Compostela, España JR Spain A tiempo completo

    Site Reliability Engineer, Santiago de CompostelaEn TuLotero estamos buscando un SRE para unirse al equipo. Buscamos una persona resolutiva y capaz de resolver problemas de forma autónoma, con experiencia gestionando sistemas en la nube (actualmente en AWS y DigitalOcean ), que utilice herramientas de automatización y despliegue de infraestructura (...


  • Palma de Mallorca, España triggle A tiempo completo

    Are you a passionate individual, who is eager to work in an environment, where no 2 days are the same and where individual contribution can make a difference in our success story? Do you want to help expand the product growth of a young and driven tech company? As a Senior Site Reliability Engineer, you empower our customers to sell transfers, tours &...


  • Palma de Mallorca, España Triggle A tiempo completo

    triggle is a young and growing company, with the mission to be the global marketplace for the travel industry, to make local experiences and services available for everyone by connecting Hotels and Activity Suppliers with the latest technology and help increase their direct guest business. Digitalising offerings for excursions and activities is complex....

  • Field Services Technician

    hace 2 semanas


    Huesca, Huesca provincia, España GE Vernova A tiempo completo

    **Job Description Summary**: GE Vernova are looking for a Field Services Technician - On Site Services to be based from our site in Huesca. To further consolidate and strengthen our GE Vernova Energy Wind turbine service and maintenance teams in the Zaragoza area, we are looking for a qualified and committed Wind Field Technician. Essential...


  • Santiago de Compostela, España Amazon A tiempo completo

    A leading logistics company in La Coruña is seeking a Reliability Maintenance Engineering Area Manager to lead a team focused on ensuring equipment reliability and compliance with safety policies. Key responsibilities include supporting site safety, implementing maintenance plans, and analyzing data for process improvement. The role requires relevant...


  • Santiago de Compostela, España Amazon A tiempo completo

    A leading global retailer in La Coruña, Spain, is seeking a Reliability Maintenance Engineering Technician. This role involves maintaining various equipment, carrying out preventative and reactive maintenance tasks, and ensuring health and safety compliance. The ideal candidate has an NFQ6 or equivalent in Engineering and experience with MHE and fault...


  • Santiago de Compostela, España Amazon A tiempo completo

    A leading global retailer in La Coruña, Spain, is seeking a Reliability Maintenance Engineering Technician.¿Le interesa este puesto? Puede encontrar toda la información relevante en la descripción a continuación.This role involves maintaining various equipment, carrying out preventative and reactive maintenance tasks, and ensuring health and safety...