Site Reliability Engineer

hace 2 semanas

Barcelona, Barcelona, España CrowdStrike A tiempo completo

As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn't changed — we're here to stop breaches, and we've redefined modern security with the world's most advanced AI-native platform. We work on large scale distributed systems, processing almost 3 trillion events per day and this traffic is growing daily. Our customers span all industries, and they count on CrowdStrike to keep their businesses running, their communities safe and their lives moving forward. We're also a mission-driven company. We cultivate a culture that gives every CrowdStriker both the flexibility and autonomy to own their careers. We're always looking to add talented CrowdStrikers to the team who have limitless passion, a relentless focus on innovation and a fanatical commitment to our customers, our community and each other. Ready to join a mission that matters? The future of cybersecurity starts with you.About the Role:Our mission is to make all of our customers' security-relevant data continuously available for automated detection and response, threat hunting, and other Falcon use cases. To enable this, the systems behind NG-SIEM are growing to accommodate >100 PB of event and action data ingested every day, up to 10 years of retention, and dozens of millions of queries per hour across large sections of the data stored.As our new NG-SIEM Site Reliability Engineer, you'll be responsible for ensuring the reliability, performance, and scalability of our serverless platform that delivers this massive scale to customers and other Falcon modules. You'll work on improving system observability, automating operational tasks, optimizing resource utilization, and maintaining our stringent SLOs while balancing cost efficiency. This role requires deep technical expertise in distributed systems, cloud infrastructure, and a passion for operational excellence.What You'll Do:Ensure Platform Reliability: Own the availability, latency, performance, and efficiency of NG-SIEM platform services handling >100 PB/day of data ingestion and millions of queries per hourBuild Automation & Tooling: Design and implement automation solutions for deployment, monitoring, incident response, and capacity planning to reduce toil and improve operational efficiencyMonitor & Optimize: Develop comprehensive observability solutions using metrics, logs, and traces; proactively identify and resolve performance bottlenecks and reliability issuesIncident Management: Lead incident response efforts, conduct blameless post-mortems, and drive continuous improvement initiatives to prevent recurrenceCapacity Planning: Analyze system performance data and growth trends to forecast infrastructure needs and ensure the platform scales efficiently with customer demandSLO/SLA Management: Define, measure, and maintain Service Level Objectives and error budgets; balance feature velocity with reliability requirementsCost Optimization: Implement strategies to optimize cloud resource utilization and reduce operational costs while maintaining performance and reliability standardsCollaborate Cross-Functionally: Partner with engineering teams to improve system design for reliability, influence architectural decisions, and embed SRE best practicesOn-Call Participation: Participate in on-call rotation to provide 24/7 support for critical production systemsDocumentation: Create and maintain runbooks, operational procedures, and technical documentation to enable team scalabilityWhat You'll Need:Experience in Site Reliability Engineering, DevOps, or similar roles supporting large-scale distributed systems in production environmentsStrong programming skills in at least one language (Go) for automation and tooling developmentDeep cloud expertise with hands-on experience in at least one major cloud platform (AWS or GCP), including compute, storage, networking, and managed servicesDistributed systems knowledge: Understanding of distributed system design patterns, consistency models, fault tolerance, and scalability principlesInfrastructure as Code: Proficiency with IaC tools (Terraform) and configuration management (Ansible, Chef, Puppet)Container orchestration: Experience with Kubernetes, Docker, Podman and container-based deployment patternsObservability expertise: Hands-on experience with monitoring and observability tools (Prometheus, Grafana)CI/CD pipelines: Experience building and maintaining continuous integration and deployment pipelinesIncident management: Proven track record of managing high-severity incidents and implementing preventive measuresData-driven approach: Ability to analyze system metrics and logs to identify trends, anomalies, and optimization opportunitiesCommunication skills: Excellent verbal and written communication abilities for remote collaboration across global teamsBonus Points:Massive scale experience: 3+ years owning systems handling over 1 trillion requests per day or more than 10 PB of data per dayMulti-cloud experience: Hands-on work with hybrid or multi-cloud environmentsDatabase expertise: Deep knowledge of distributed databases, data lakes, or SIEM platforms (ClickHouse, Redis, MySQL)Security background: Exposure to cybersecurity, threat intelligence, or security operationsNetworking expertise: Advanced understanding of network protocols, load balancing, and CDN technologies#LI-MB1Benefits of Working at CrowdStrike: Remote-friendly and flexible work cultureMarket leader in compensation and equity awardsComprehensive physical and mental wellness programsCompetitive vacation and holidays for rechargePaid parental and adoption leavesProfessional development opportunities for all employees regardless of level or roleEmployee Networks, geographic neighborhood groups, and volunteer opportunities to build connectionsVibrant office culture with world class amenitiesGreat Place to Work Certified across the globe CrowdStrike is proud to be an equal opportunity employer. We are committed to fostering a culture of belonging where everyone is valued for who they are and empowered to succeed. We support veterans and individuals with disabilities through our affirmative action program. CrowdStrike is committed to providing equal employment opportunity for all employees and applicants for employment. The Company does not discriminate in employment opportunities or practices on the basis of race, color, creed, ethnicity, religion, sex (including pregnancy or pregnancy-related medical conditions), sexual orientation, gender identity, marital or family status, veteran status, age, national origin, ancestry, physical disability (including HIV and AIDS), mental disability, medical condition, genetic information, membership or activity in a local human rights commission, status with regard to public assistance, or any other characteristic protected by law. We base all employment decisions--including recruitment, selection, training, compensation, benefits, discipline, promotions, transfers, lay-offs, return from lay-off, terminations and social/recreational programs--on valid job requirements.If you need assistance accessing or reviewing the information on this website or need help submitting an application for employment or requesting an accommodation, please contact us at for further assistance.

Site Reliability Engineer

hace 1 semana

Barcelona, Barcelona, España F. Hoffmann-La Roche Ltd A tiempo completo

At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections, where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure...
Site Reliability Engineer

hace 1 semana

Barcelona, Barcelona, España CrowdStrike A tiempo completo

As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn't changed — we're here to stop breaches, and we've redefined modern security with the world's most advanced AI-native platform. We work on large scale distributed systems, processing almost 3...
Senior Site Reliability Engineer

hace 6 horas

Barcelona, Barcelona, España Okta A tiempo completo

Get to know OktaOkta is The World's Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth.At Okta, we celebrate a variety of...
Senior Site Reliability Engineer

hace 2 días

Barcelona, Barcelona, España Spendesk A tiempo completo

About the TeamThe Infrastructure team at Spendesk builds the tools, systems, and internal products that empower every engineering team to move faster and more safely. We are transforming traditional infrastructure into a developer-facing platform focused on enablement, automation, and scalability. We own CI/CD platform (ArgoCD and Github Actions), secrets...
Site Reliability Engineer

hace 4 horas

Barcelona, Barcelona, España N26 A tiempo completo

About the opportunityWe are seeking a Site Reliability Engineer to join the Observability group inside our Platform Engineering domain.Platform Engineering's goal is to provide easy to use, self-service platforms to enable other segments to easily build, deploy and monitor their business applications. And Observability's role in that part of the company is...
Site Reliability Engineer

hace 4 días

Barcelona, Barcelona, España Perk A tiempo completo

About UsPerk (formerly TravelPerk) is the intelligent platform for travel and spend management. Built to tackle the time-consuming, manual work that gets in the way of real work, our tools automate everything from travel bookings to expenses, invoice processing, and more. By eliminating this shadow work that wastes hours, erodes morale, and saps innovation,...
Senior Site Reliability Engineer

hace 6 horas

Barcelona, Barcelona, España N26 A tiempo completo

About the opportunityWe are seeking a Senior Site Reliability Engineer to join the Platform Engineering Domain in the AI Platform Team.The mission of Platform Engineering is to provide trusted, performant, self-service platforms that empower product teams to build "the bank the world loves to use." The AI Platform team contributes to this mission by creating...
Application Reliability Engineer

hace 1 semana

Barcelona, Barcelona, España GOLD AVENUE A tiempo completo

Join us as our new Application Reliability EngineerWe're looking for a pragmatic, detail-oriented Application Reliability Engineer with strong experience in Ruby on Rails and React , Expo) to strengthen our technology team.In this role, you'll act as the bridge between our hosting platforms and our development teams, ensuring that our applications remain...
Senior Site Reliability Engineer, Platforms Team

hace 1 semana

Barcelona, Barcelona, España Trabajos en NETQUEST A tiempo completo

About Your New Role Our Platforms Team is a diverse and dynamic group focused on crafting and maintaining high-performance platforms, primarily leveraging the power of Amazon Web Services and Kubernetes. We're all about growth here – from dedicated Friday training sessions and daily collaborative pair-programming to shadowing opportunities and access to...
SRE - Site Reliability Engineering

hace 6 días

Barcelona, Barcelona, España INGENIEROJOB A tiempo completo

de la ofertaSRE - Site Reliability EngineeringCómo será tu día a día…Formarás parte del equipo responsable de la fiabilidad, disponibilidad y mejora continua de las aplicaciones críticas del sistema de información.Trabajarás En Iniciativas De Alto Impacto Relacionadas ConObservabilidad, resiliencia y automatización.Gestión de crisis e incidencias...

América

Europa

Asia / Oceanía

África

Site Reliability Engineer