Platform Reliability Engineer

hace 1 mes


Barcelona, España AstraZeneca GmbH A tiempo completo

Job Title: Platform Reliability Engineer

Career Level - E

Introduction to role:

Join us as a Platform Reliability Engineer in our Commercial IT – SSD, Data, Analytics and AI Platform Success Team. Your primary focus will be to ensure the stability, performance, and reliability of our Data, Analytics, and AI systems. You will bridge the gap between development and operations by generating insights into sub-optimal processes and optimization opportunities. This role offers an exciting opportunity to integrate Agile, Lean and SaFe practices within monitoring and observability initiatives and to continuously improve delivery cycle times.

Accountabilities:

As a Platform Reliability Engineer, you will be responsible for the evaluation, selection, and deployment of monitoring & observability technologies. You will manage and maintain monitoring infrastructure, ensuring it aligns with industry best practices. You will collaborate with DevOps, CriticalOps, and IT leadership teams to understand system requirements and design effective monitoring strategies. You will also develop and implement monitoring solutions for infrastructure, applications, and services.

Responsibilities:
  • Ensuring the stability, performance, and reliability of Data, Analytics, and AI systems by implementing and maintaining robust monitoring and observability solutions.
  • Designing, deploying, and managing monitoring tools and practices that provide insights into the health and performance of our data infrastructure and analytics processes.
  • Bridging the gap between development and operations by generating insights into sub-optimal processes and optimization opportunities.
  • Maintaining working knowledge of platform architecture and business acumen.
  • Integrating Agile, Lean, and SaFe practices within monitoring and observability initiatives to continuously improve delivery cycle times.
  • Exploring and implementing new ways to automate systems, designing and testing automation processes, identifying quality issues, and supporting IT platform teams to eliminate defects and errors with product and platform development.
Experience leveraging AIOps capabilities to uplift existing production operations Technology/Tool Management
  • Responsible for the evaluation, selection, and deployment of monitoring & observability technologies suitable for the organization’s needs.
  • Manage and maintain monitoring infrastructure, ensuring it aligns with industry best practices.
Monitoring & Observability Practice Management
  • Collaborate with DevOps, CriticalOps, and IT leadership teams to understand system requirements and design effective monitoring strategies.
  • Establish key metrics and KPIs that enable insights and analytics to achieve data-driven continuous improvement.
  • Provide training and support to other teams on using monitoring tools effectively.
  • Create and maintain documentation for monitoring and observability practices, including standard operating procedures and best practices.
  • Stay abreast of industry trends, emerging technologies, and best practices related to monitoring and observability platforms.
Monitoring & Observability Implementation & Operations
  • Develop and implement monitoring solutions for infrastructure, applications, and services.
  • Design and configure alerting mechanisms to proactively respond to potential issues.
  • Use monitoring tools to identify and troubleshoot issues in real-time.
  • Collaborate with other teams to resolve incidents promptly and prevent reoccurrence.
  • Analyze monitoring data to identify performance bottlenecks and areas for improvement.
  • Work with development and operations teams to optimize system performance based on monitoring insights.
  • Implement automation scripts and workflows to streamline monitoring processes.
  • Integrate monitoring solutions with existing frameworks for seamless operation.
  • Identify and evaluate “self-healing” opportunities based on production issue trend analysis to inform AIOps roadmap.
Essential Qualifications:
  • Degree level education in computer science, information technology, or a related field.
  • Proven experience as a monitoring and observability engineer or a similar role.
  • Proficient in developing monitoring capabilities and configuring integration with tools such as Prometheus, Grafana, Splunk, SumoLogic, DataDog, DynaTrace, etc.
  • Strong scripting skills (e.g., Python) for automation in data environments.
  • Familiarity with logging, tracing, and APM (Application Performance Monitoring) solutions.
Desirable Qualifications:
  • Customer engagement experience.
  • Knowledge of data processing frameworks (e.g., Apache Spark) and data storage solutions (e.g., data lakes, warehouses).
  • Experience with data orchestration tools (e.g., Apache Airflow).
  • Understanding of data lineage and metadata management.

Ready to make a difference? Apply today and be part of a team that has the backing to innovate, disrupt an industry and change lives.

#J-18808-Ljbffr

  • Barcelona, España Astrazeneca Gmbh A tiempo completo

    .Job Title: Platform Reliability Engineer Career Level - E Introduction to role: Join us as a Platform Reliability Engineer in our Commercial IT – SSD, Data, Analytics and AI Platform Success Team. Your primary focus will be to ensure the stability, performance, and reliability of our Data, Analytics, and AI systems. You will bridge the gap between...


  • Barcelona, España Astrazeneca Gmbh A tiempo completo

    .Job Title: Platform Reliability EngineerCareer Level - EIntroduction to role:Join us as a Platform Reliability Engineer in our Commercial IT – SSD, Data, Analytics and AI Platform Success Team. Your primary focus will be to ensure the stability, performance, and reliability of our Data, Analytics, and AI systems. You will bridge the gap between...


  • Barcelona, España Astrazeneca Gmbh A tiempo completo

    Job Title: Platform Reliability Engineer Career Level - E Introduction to role: Join us as a Platform Reliability Engineer in our Commercial IT – SSD, Data, Analytics and AI Platform Success Team. Your primary focus will be to ensure the stability, performance, and reliability of our Data, Analytics, and AI systems. You will bridge the gap between...


  • Barcelona, España Astrazeneca Gmbh A tiempo completo

    Job Title: Platform Reliability EngineerCareer Level - EIntroduction to role:Join us as a Platform Reliability Engineer in our Commercial IT – SSD, Data, Analytics and AI Platform Success Team. Your primary focus will be to ensure the stability, performance, and reliability of our Data, Analytics, and AI systems. You will bridge the gap between development...


  • Barcelona, España Astrazeneca Gmbh A tiempo completo

    Job Title: Platform Reliability EngineerCareer Level - EIntroduction to role:Join us as a Platform Reliability Engineer in our Commercial IT – SSD, Data, Analytics and AI Platform Success Team. Your primary focus will be to ensure the stability, performance, and reliability of our Data, Analytics, and AI systems. You will bridge the gap between development...


  • Barcelona, España Astrazeneca Gmbh A tiempo completo

    .Job Title: Platform Reliability EngineerCareer Level - EIntroduction to role:Join us as a Platform Reliability Engineer in our Commercial IT – SSD, Data, Analytics and AI Platform Success Team. Your primary focus will be to ensure the stability, performance, and reliability of our Data, Analytics, and AI systems. You will bridge the gap between...


  • Barcelona, Barcelona, España Deutsche Bank Ag A tiempo completo

    As a Cloud Platform Reliability Engineer at Deutsche Bank Ag, you will contribute to the implementation and operation of our Online Banking platform in Google Cloud.We are responsible for digital customer access channels and services offered by Deutsche Bank. Our application portfolio includes modern cloud-driven customer services, administration frontends...


  • Barcelona, España Antal International A tiempo completo

    Job Description Senior Site Reliability Engineer (SRE) - Fintech SectorLocation: Barcelona, Spain (Hybrid Model)Company Overview: Join a leading international fintech company at the forefront of innovation, revolutionizing financial services for millions worldwide. Our client is looking for a Senior Site Reliability Engineer (SRE) to play a pivotal role in...


  • Barcelona, España Antal International A tiempo completo

    Job Description Senior Site Reliability Engineer (SRE) - Fintech Sector Location: Barcelona, Spain (Hybrid Model) Company Overview: Join a leading international fintech company at the forefront of innovation, revolutionizing financial services for millions worldwide. Our client is looking for a Senior Site Reliability Engineer (SRE) to play a pivotal role in...


  • Barcelona, España Antal International A tiempo completo

    Job DescriptionSenior Site Reliability Engineer (SRE) - Fintech Sector Location: Barcelona, Spain (Hybrid Model)Company Overview: Join a leading international fintech company at the forefront of innovation, revolutionizing financial services for millions worldwide. Our client is looking for a Senior Site Reliability Engineer (SRE) to play a pivotal role in...


  • Barcelona, España Antal International A tiempo completo

    Job Description Senior Site Reliability Engineer (SRE) - Fintech SectorLocation: Barcelona, Spain (Hybrid Model)Company Overview: Join a leading international fintech company at the forefront of innovation, revolutionizing financial services for millions worldwide. Our client is looking for a Senior Site Reliability Engineer (SRE) to play a pivotal role in...


  • Barcelona, España Antal International A tiempo completo

    Job Description Senior Site Reliability Engineer (SRE) - Fintech Sector Location: Barcelona, Spain (Hybrid Model) Company Overview: Join a leading international fintech company at the forefront of innovation, revolutionizing financial services for millions worldwide.Our client is looking for a Senior Site Reliability Engineer (SRE) to play a pivotal role in...


  • Barcelona, España Antal International A tiempo completo

    Job Description Senior Site Reliability Engineer (SRE) - Fintech Sector Location: Barcelona, Spain (Hybrid Model) Company Overview: Join a leading international fintech company at the forefront of innovation, revolutionizing financial services for millions worldwide. Our client is looking for a Senior Site Reliability Engineer (SRE) to play a pivotal role...


  • Barcelona, España Antal International Network A tiempo completo

    Senior Site Reliability Engineer (SRE) - Fintech Sector Location: Barcelona, Spain (Hybrid Model) Company Overview: Join a leading international fintech company at the forefront of innovation, revolutionizing financial services for millions worldwide. Our client is looking for a Senior Site Reliability Engineer (SRE) to play a pivotal role in ensuring the...


  • Barcelona, España Antal International Network A tiempo completo

    Senior Site Reliability Engineer (SRE) - Fintech Sector Location: Barcelona, Spain (Hybrid Model) Company Overview: Join a leading international fintech company at the forefront of innovation, revolutionizing financial services for millions worldwide. Our client is looking for a Senior Site Reliability Engineer (SRE) to play a pivotal role in ensuring the...


  • Barcelona, Barcelona, España Preply Inc. A tiempo completo

    About the RoleWe are seeking an experienced Senior Site Reliability Engineer to join our team at Preply Inc. in Barcelona, Spain. As a key member of our Platform tribe, you will be responsible for ensuring the high reliability and performance of our global language education platform.Key Responsibilities:Design and implement scalable infrastructure solutions...


  • Barcelona, Barcelona, España Flashpoint Venture Capital Group A tiempo completo

    Unlock human potential at Preply, a global language education platform. As a Site Reliability Engineer, you will be part of the Platform tribe, responsible for ensuring high reliability and top-in-the-industry uptime of our systems. You will work with software development, infrastructure operations, and business skills to run a large-scale, fault-tolerant...


  • Barcelona, España Deutsche Bank Ag A tiempo completo

    .Site Reliability Engineer (m/w/d) Online Banking & Brokerage Germany (based in Barcelona)Apply locations Barcelona, Edificio Mitre time type Full time posted on Posted 17 Days Ago job requisition id R0273819Job Description:Team/BereichThe Online Tribe at Deutsche Bank Private Bank Germany is responsible for the digital customer access channels and services...


  • Barcelona, España Antal International A tiempo completo

    Job DescriptionSenior Site Reliability Engineer (SRE) - Fintech SectorLocation: Barcelona, Spain (Hybrid Model)Company Overview: Join a leading international fintech company at the forefront of innovation, revolutionizing financial services for millions worldwide. Our client is looking for a Senior Site Reliability Engineer (SRE) to play a pivotal role in...


  • Barcelona, Barcelona, España Manomano A tiempo completo

    Discover a dynamic work environment at ManoMano, where innovation and customer satisfaction are paramount. As a Senior Site Reliability Engineer, you will be part of a talented team shaping the future of home improvement and renovation online.With an estimated salary range of $120,000 to $180,000 per annum, depending on location and experience, this role...