Site Reliability Engineer
hace 6 días
About Avaya Avaya is an enterprise software leader that helps the world's largest organizations and government agencies forge unbreakable connections. The Avaya Infinity platform unifies fragmented customer experiences, connecting the channels, insights, technologies, and workflows that together create enduring customer and employee relationships. We believe success is built through strong connections – with each other, with our work, and with our mission. At Avaya, you'll find a community that values your contributions and supports your growth every step of the way. Learn more at Description Role Overview We are seeking a Site Reliability Engineer (SRE) who will drive stability, reliability, and performance across our Azure-based platforms. This role blends operational excellence, proactive incident management, and strong collaboration with DevOps, Cloud, and Security teams. The ideal candidate will have hands-on experience with Azure, IaC (Terraform/Ansible), CI/CD (Jenkins/GitHub Actions), and monitoring systems, while also contributing to governance, cost optimization, and automation strategies that reduce toil and prevent issues before they occur. This position includes 24x7 support coverage (rotational) and requires strong ownership in managing major incidents, RCA processes, and continuous service improvements. Key Responsibilities Reliability & Incident Management Serve as a key member of the 24x7 on-call rotation, responding to and managing incidents across production and pre-production environments. Lead incident bridges, coordinate root cause analysis (RCA), and ensure post-incident reviews drive systemic improvements. Maintain clear communication with cross-functional teams and leadership during major incidents. Monitoring, Alerts & Prevention Build, tune, and maintain observability dashboards (Azure Monitor, Prometheus, Grafana, Datadog, Log Analytics). Define SLOs, SLIs, and error budgets to proactively identify and mitigate risks before customer impact. Continuously enhance alert quality, reduce false positives, and automate runbooks for faster recovery. Analyze trends to prevent recurring issues and support teams in resilience engineering. Governance & Cost Management Support cloud governance frameworks—ensuring resource tagging, naming conventions, policy compliance, and operational guardrails. Work with FinOps and DevOps teams to track, optimize, and report cost efficiency across Azure subscriptions. Participate in change, release, and CAB reviews to ensure reliability, compliance, and readiness. Infrastructure Awareness & Automation Understand IaC designs (Terraform, Ansible) and deployment workflows, leveraging automation for operational efficiency. Collaborate with DevOps and platform teams to reduce manual effort and "toil" in infrastructure operations. Implement self-healing mechanisms and automate recurring operational tasks where feasible. Ensure consistency and compliance through integration with CI/CD and policy-as-code systems. Security & Compliance Embed DevSecOps practices in daily operations—monitor vulnerabilities, patch non-compliant resources, and validate certificate rotations. Support FIPS, FedRAMP, PCI, and CIS control implementations in cloud and containerized environments. Collaboration & Agile Practices Partner with engineering, QA, and product teams to align reliability goals with delivery outcomes. Participate in agile ceremonies and advocate for SRE principles—"measure everything, automate where possible, and reduce toil." Document runbooks, playbooks, and operational processes to improve team efficiency and knowledge sharing. Requirements Required Skills & Experience 5+ years in Site Reliability, DevOps, or Cloud Operations roles. Proven expertise in Azure cloud operations and distributed system reliability. Strong understanding of Terraform, Ansible, and CI/CD pipelines (Jenkins, GitHub Actions). Experience with observability tools (Azure Monitor, Grafana, Prometheus, Datadog, or similar). Solid grasp of incident management frameworks (P1–P3 handling, RCA, PIRs, on-call rotations). Familiarity with governance, cost management, and security best practices in multi-cloud environments. Knowledge of containerized deployments (AKS/Kubernetes) and networking fundamentals. Excellent analytical, troubleshooting, and communication skills. Desired Behaviours Proactive Prevention: Identifies risks before they escalate into incidents. Accountability: Owns service reliability and communicates with clarity. Collaboration: Works seamlessly with platform, DevOps, and product teams. Efficiency: Focuses on automation to reduce manual effort and improve MTTR. Continuous Improvement: Learns from failures, iterates processes, and improves documentation. Security & Governance Mindset: Balances agility with control and compliance. Experience 3 years experience at the Engineer Two level or 5 – 8 years total experience Education Bachelor degree or equivalent experience Master degree or equivalent experience Footer Applicants must be currently authorized to work in the United States without the need for visa sponsorship now or in the future. Avaya is an Equal Opportunity employer and a U.S. Federal Contractor. Our commitment to equality is a core value of Avaya. All qualified applicants and employees receive equal treatment without consideration for race, religion, sex, age, sexual orientation, gender identity, national origin, disability, status as a protected veteran or any other protected characteristic. In general, positions at Avaya require the ability to communicate and use office technology effectively. Physical requirements may vary by assigned work location. This job brief/description is subject to change. Nothing in this job description restricts Avaya right to alter the duties and responsibilities of this position at any time for any reason.
-
Site Reliability Engineer 1
hace 6 días
Pozuelo de Alarcón, España Norconsulting A tiempo completoSite Reliability Engineer 1Pozuelo de Alarcón, MD, SpainJob Description :Administradores de Sistemas MiddlewareUna de las empresas lider del sector de Seguridad busca Administrador de Sistemas Middleware para unirse a su equipo se systemas y desarrollo en sus oficinas en Madrid.Experiencia en Middleware.ADMINISTRADOR MIDDLEWAREAptitudes principales...
-
Site Reliability Engineer
hace 2 semanas
Palas de Rei, España K2 Partnering Solutions A tiempo completoWe are looking for a Senior Site Reliability Engineer to join a leading company's Platform Engineering team . You will focus on building scalable, reliable systems and improving platform performance through automation and solid engineering practices. Obtenga más información sobre las tareas generales relacionadas con esta oportunidad a continuación, así...
-
Site Reliability Engineer
hace 4 días
Palma de Mallorca, España WebBeds A tiempo completo**Job Title**:Site Reliability Engineer **Department**: **IT **Location (primary)**:Palma **What will you do on your journey with WebBeds?** WebBeds is the world’s fastest growing provider of accommodation distribution services to the travel industry. Our products incorporate distribution APIs, payment integrations, ERP integration, Data Lakes, User...
-
Site Reliability Engineer
hace 1 semana
santiago de compostela, España JR Spain A tiempo completoSite Reliability Engineer, Santiago de Compostela En TuLotero estamos buscando un SRE para unirse al equipo. Buscamos una persona resolutiva y capaz de resolver problemas de forma autónoma, con experiencia gestionando sistemas en la nube (actualmente en AWS y DigitalOcean ), que utilice herramientas de automatización y despliegue de infraestructura...
-
Site Reliability Engineer
hace 4 días
Palma de Mallorca, España WebBeds A tiempo completo**Who are WebBeds?** WebBeds is the fastest growing and most significant accommodation supplier to the travel industry. We are a global company offering ground services (hotels, transfers, tours, activities) to travel professionals. Our products help our partners and customers to create amazing Travel experiences. Our Products range from a Retail Online...
-
Site Reliability Engineer
hace 2 días
Palma de Mallorca, España WebBeds A tiempo completo**Who are WebBeds?** WebBeds is the fastest growing and most significant accommodation supplier to the travel industry. We are a global company offering ground services (hotels, transfers, tours, activities) to travel professionals. Our products help our partners and customers to create amazing Travel experiences. Our Products range from a Retail Online...
-
Site Reliability Engineer
hace 2 días
Palma de Mallorca, España WebBeds A tiempo completo**Who are WebBeds?** WebBeds is the fastest growing and most significant accommodation supplier to the travel industry. We are a global company offering ground services (hotels, transfers, tours, activities) to travel professionals. Our products help our partners and customers to create amazing Travel experiences. Our Products range from a Retail Online...
-
Site Reliability Engineer 1
hace 6 días
pozuelo de alarcón, España Norconsulting A tiempo completoSite Reliability Engineer 1 Pozuelo de Alarcón, MD, Spain Job Description : Administradores de Sistemas Middleware Una de las empresas lider del sector de Seguridad busca Administrador de Sistemas Middleware para unirse a su equipo se systemas y desarrollo en sus oficinas en Madrid. Experiencia en Middleware. ADMINISTRADOR MIDDLEWARE Aptitudes principales :...
-
Site Reliability Engineer
hace 2 semanas
Palas de Rei, España K2 Partnering Solutions A tiempo completoWe are looking for a Senior Site Reliability Engineer to join a leading company's Platform Engineering team . You will focus on building scalable, reliable systems and improving platform performance through automation and solid engineering practices.Revise detenidamente toda la documentación de la solicitud antes de hacer clic en el botón de solicitar al...
-
Site Reliability Engineer
hace 4 días
Santiago de Compostela, España JR Spain A tiempo completoSite Reliability Engineer, Santiago de CompostelaEn TuLotero estamos buscando un SRE para unirse al equipo. Buscamos una persona resolutiva y capaz de resolver problemas de forma autónoma, con experiencia gestionando sistemas en la nube (actualmente en AWS y DigitalOcean ), que utilice herramientas de automatización y despliegue de infraestructura (...