Site Reliability Engineer – IDM Team

hace 7 días


Málaga, España Lunik - Explorers at Work A tiempo completo

Work model: Hybrid (2 days in the office per week)Job Type : Full TimeJob Location : Málaga, Madrid or SevillaAs a Site Reliability Engineer (SRE) in the IDM team , you will be responsible for contributing to the reliability, availability, and performance of mission-critical applications and systems. You will be part of a team that bridges the gap between development and operations, applying your technical expertise and problem-solving skills to implement best practices in infrastructure automation, monitoring, scaling, and incident response.The role requires prior experience as an SRE or in similar functions, as well as solid knowledge of the technologies and methodologies described below. A collaborative mindset, focus on continuous improvement, and strong teamwork skills will be key to success in this role.Candidates should ideally have a background in open-source systems and Linux, although knowledge and experience with Microsoft systems will also be considered positively.Responsibilities Reliability & Availability Contribute to maintaining and improving system reliability, uptime, and performance across production environments.Support tracking of service-level objectives (SLOs), service-level indicators (SLIs), and service-level agreements (SLAs).Assist in improving incident response processes and implementing fault-tolerant systems.Automation & Infrastructure Develop and maintain automation tools for infrastructure management.Collaborate with development teams to integrate reliability practices into CI/CD pipelines.Contribute to improving scalability and resilience of cloud infrastructure.Monitoring & Observability Implement and maintain monitoring systems and alerts to proactively identify issues.Help define key performance metrics and support the implementation of logging and observability solutions.Incident Management & Root Cause Analysis Participate in incident response, assisting with root cause analysis and post-mortems.Document findings and collaborate on improving procedures and playbooks.Work closely with other SREs, software engineers, and cross-functional teams to ensure service reliability.Contribute to continuous improvement initiatives to reduce toil and optimize resource utilization.Requirements Required Soft Skills Problem-Solving & Critical Thinking Ability to analyze and troubleshoot complex technical issues.Continuous improvement mindset with innovative problem-solving skills.Strong verbal and written communication skills to explain technical issues.Ability to collaborate with multidisciplinary teams.Adaptability & Flexibility Comfortable working in dynamic environments with shifting priorities.Open to new technologies and adaptable in improving processes.Ownership & Accountability Strong commitment to production system reliability.Proactive in identifying and resolving issues.Resilience under Pressure Ability to remain calm and focused during critical incidents.Required Technical Skills Infrastructure Automation & Configuration Management Experience with IaC tools such as Terraform, Ansible, AWX, or Puppet.Knowledge of Docker and Kubernetes.Familiarity with cloud platforms (AWS, GCP, or Azure). This is not mandatory, but it will be considered positively.Administration of hypervisors (VMware or OpenStack is a plus).DNS management in Microsoft and open-source environments (BIND, CoreDNS, etc.).Monitoring & Observability Hands-on experience with tools like Prometheus, Grafana, Icinga.Knowledge of logging and tracing (ELK stack, Fluentd, OpenTelemetry).Authentication & Identity Management Familiarity with authentication protocols: LDAP, SAML, OAuth, OpenID Connect.Experience with tools such as Active Directory, FreeIPA, Keycloak is a plus and ADFS.Knowledge of MFA solutions (PrivacyIDEA, Azure MFA, Duo, Okta, etc.).Experience supporting incident management and documenting post-mortems.Operating Systems Administration of Ubuntu and CentOS. We will consider Microsoft operating systems favorably, but it is not a requirement.Knowledge of security, performance tuning, and patch management.Microsoft Systems Management Knowledge of Active Directory, GPOs, DNS, and replication.Scripting & Programming Proficiency in PowerShell, Bash, Python and Ansible.Ability to automate tasks and manage infrastructure as code.Containerization & Orchestration Experience with Docker, Podman, and Kubernetes.Deployment and management of containerized applications.Performance Tuning & Optimization Ability to identify and resolve bottlenecks in distributed systems.#J-18808-Ljbffr



  • málaga, España Lunik - Explorers at Work A tiempo completo

    Work model: Hybrid (2 days in the office per week)Job Type : Full TimeJob Location : Málaga, Madrid or Sevilla As a Site Reliability Engineer (SRE) in the IDM team , you will be responsible for contributing to the reliability, availability, and performance of mission-critical applications and systems. You will be part of a team that bridges the gap between...


  • málaga, España Epam A tiempo completo

    An established industry player is seeking a Site Reliability Engineer with a strong software engineering background. This role involves ensuring the stability of production systems, managing environments, and driving innovative solutions to enhance efficiency. You will collaborate with a diverse team, tackling complex issues and implementing robust...


  • Málaga, España Epam A tiempo completo

    An established industry player is seeking a Site Reliability Engineer with a strong software engineering background.Por favor, lea detenidamente la siguiente descripción del puesto para asegurarse de que encaja con el perfil antes de enviar su solicitud.This role involves ensuring the stability of production systems, managing environments, and driving...

  • Site Reliability Engineer

    hace 2 semanas


    málaga, España JR Spain A tiempo completo

    Social network you want to login/join with: Truly career defining roles here for Site Reliability Engineers with one of Europe’s fastest growing tech businesses. High-impact role where you will monitor systems that see incredibly high-traffic daily Oversee and troubleshoot production systems with focus on uptime, reliability, and performance Respond to...


  • Málaga, Málaga, España NEP Spain A tiempo completo

    Our CompanyNEP is Europe's leading provider of outsourced television production services.We are always looking for great people to join our team; people with a passion for people and teamwork helping us deliver exceptional results for our clients.NEP Europe is currently looking for a Site Reliability Engineer to join our team on location in Malaga, SpainThe...

  • Site Reliability

    hace 1 semana


    Málaga, Málaga, España Canonical - Jobs A tiempo completo

    Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers,...


  • Málaga, España EPAM Systems, Inc. A tiempo completo

    Do you have a software engineering background and strong knowledge in MS SQL and PowerShell? Are you an open-minded professional with good English skills? If it sounds like you, this could be the perfect opportunity to join EPAM as a Site Reliability Engineer. **Responsibilities** - Take responsibility for production stability and problem resolution of the...


  • Málaga, España Canonical A tiempo completo

    Join to apply for the Senior Site Reliability Engineer role at Canonical .Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our...

  • Site Reliability Engineer

    hace 2 semanas


    Málaga, España Epam A tiempo completo

    Description DESCRIPTION Do you have a software engineering background and strong knowledge in MS SQL and PowerShell? Are you an open-minded professional with good English skills? If it sounds like you, this could be the perfect opportunity to join EPAM as a Site Reliability Engineer. What Youll Do Take responsibility for production stability and problem...


  • málaga, España Epam A tiempo completo

    Do you have a software engineering background and strong knowledge in MS SQL and PowerShell? Are you an open-minded professional with good English skills? If it sounds like you, this could be the perfect opportunity to join EPAM as a Site Reliability Engineer. What Youll Do Take responsibility for production stability and problem resolution of the Quant /...