Site Reliability Engineer
hace 4 días
As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn't changed — we're here to stop breaches, and we've redefined modern security with the world's most advanced AI-native platform. We work on large scale distributed systems, processing almost 3 trillion events per day and this traffic is growing daily. Our customers span all industries, and they count on CrowdStrike to keep their businesses running, their communities safe and their lives moving forward. We're also a mission-driven company. We cultivate a culture that gives every CrowdStriker both the flexibility and autonomy to own their careers. We're always looking to add talented CrowdStrikers to the team who have limitless passion, a relentless focus on innovation and a fanatical commitment to our customers, our community and each other. Ready to join a mission that matters? The future of cybersecurity starts with you.About the Role:Our mission is to make all of our customers' security-relevant data continuously available for automated detection and response, threat hunting, and other Falcon use cases. To enable this, the systems behind NG-SIEM are growing to accommodate >100 PB of event and action data ingested every day, up to 10 years of retention, and dozens of millions of queries per hour across large sections of the data stored.As our new NG-SIEM Site Reliability Engineer, you'll be responsible for ensuring the reliability, performance, and scalability of our serverless platform that delivers this massive scale to customers and other Falcon modules. You'll work on improving system observability, automating operational tasks, optimizing resource utilization, and maintaining our stringent SLOs while balancing cost efficiency. This role requires deep technical expertise in distributed systems, cloud infrastructure, and a passion for operational excellence.What You'll Do:Ensure Platform Reliability: Own the availability, latency, performance, and efficiency of NG-SIEM platform services handling >100 PB/day of data ingestion and millions of queries per hourBuild Automation & Tooling: Design and implement automation solutions for deployment, monitoring, incident response, and capacity planning to reduce toil and improve operational efficiencyMonitor & Optimize: Develop comprehensive observability solutions using metrics, logs, and traces; proactively identify and resolve performance bottlenecks and reliability issuesIncident Management: Lead incident response efforts, conduct blameless post-mortems, and drive continuous improvement initiatives to prevent recurrenceCapacity Planning: Analyze system performance data and growth trends to forecast infrastructure needs and ensure the platform scales efficiently with customer demandSLO/SLA Management: Define, measure, and maintain Service Level Objectives and error budgets; balance feature velocity with reliability requirementsCost Optimization: Implement strategies to optimize cloud resource utilization and reduce operational costs while maintaining performance and reliability standardsCollaborate Cross-Functionally: Partner with engineering teams to improve system design for reliability, influence architectural decisions, and embed SRE best practicesOn-Call Participation: Participate in on-call rotation to provide 24/7 support for critical production systemsDocumentation: Create and maintain runbooks, operational procedures, and technical documentation to enable team scalabilityWhat You'll Need:Experience in Site Reliability Engineering, DevOps, or similar roles supporting large-scale distributed systems in production environmentsStrong programming skills in at least one language (Go) for automation and tooling developmentDeep cloud expertise with hands-on experience in at least one major cloud platform (AWS or GCP), including compute, storage, networking, and managed servicesDistributed systems knowledge: Understanding of distributed system design patterns, consistency models, fault tolerance, and scalability principlesInfrastructure as Code: Proficiency with IaC tools (Terraform) and configuration management (Ansible, Chef, Puppet)Container orchestration: Experience with Kubernetes, Docker, Podman and container-based deployment patternsObservability expertise: Hands-on experience with monitoring and observability tools (Prometheus, Grafana)CI/CD pipelines: Experience building and maintaining continuous integration and deployment pipelinesIncident management: Proven track record of managing high-severity incidents and implementing preventive measuresData-driven approach: Ability to analyze system metrics and logs to identify trends, anomalies, and optimization opportunitiesCommunication skills: Excellent verbal and written communication abilities for remote collaboration across global teamsBonus Points:Massive scale experience: 3+ years owning systems handling over 1 trillion requests per day or more than 10 PB of data per dayMulti-cloud experience: Hands-on work with hybrid or multi-cloud environmentsDatabase expertise: Deep knowledge of distributed databases, data lakes, or SIEM platforms (ClickHouse, Redis, MySQL)Security background: Exposure to cybersecurity, threat intelligence, or security operationsNetworking expertise: Advanced understanding of network protocols, load balancing, and CDN technologies#LI-MB1Benefits of Working at CrowdStrike: Remote-friendly and flexible work cultureMarket leader in compensation and equity awardsComprehensive physical and mental wellness programsCompetitive vacation and holidays for rechargePaid parental and adoption leavesProfessional development opportunities for all employees regardless of level or roleEmployee Networks, geographic neighborhood groups, and volunteer opportunities to build connectionsVibrant office culture with world class amenitiesGreat Place to Work Certified across the globe CrowdStrike is proud to be an equal opportunity employer. We are committed to fostering a culture of belonging where everyone is valued for who they are and empowered to succeed. We support veterans and individuals with disabilities through our affirmative action program. CrowdStrike is committed to providing equal employment opportunity for all employees and applicants for employment. The Company does not discriminate in employment opportunities or practices on the basis of race, color, creed, ethnicity, religion, sex (including pregnancy or pregnancy-related medical conditions), sexual orientation, gender identity, marital or family status, veteran status, age, national origin, ancestry, physical disability (including HIV and AIDS), mental disability, medical condition, genetic information, membership or activity in a local human rights commission, status with regard to public assistance, or any other characteristic protected by law. We base all employment decisions--including recruitment, selection, training, compensation, benefits, discipline, promotions, transfers, lay-offs, return from lay-off, terminations and social/recreational programs--on valid job requirements.If you need assistance accessing or reviewing the information on this website or need help submitting an application for employment or requesting an accommodation, please contact us at for further assistance.
-
Site Reliability Engineer
hace 2 días
Barcelona, Barcelona, España Switch Tech Talent A tiempo completoRole: Site Reliability EngineerLocation:Barcelona/Hybrid (3 days a week in office)Salary:up to €85,000 per annumKey Skills:AWS, IaC, Docker, ScriptingAs a Site Reliability Engineer you will be at the forefront of maintaining robust, scalable, and secure cloud solutions that power this cutting-edge e-commerce platform. Your expertise will ensure seamless,...
-
Site reliability engineer
hace 1 semana
Barcelona, Barcelona, España K2 Partnering Solutions A tiempo completo 30.000 € - 80.000 € al añoWe're hiringSite Reliability Engineer – Platform EngineeringBarcelona, Spain— Hybrid (2 days/week on-site)4+ years of experienceWe're looking for an SRE who's passionate about building scalable, secure and reliable platforms in a modern Kubernetes environment.What you'll do:• Design, build and maintain high-quality, scalable systems on Kubernetes•...
-
Server Site Reliability Engineer
hace 1 semana
Barcelona, Barcelona, España arsys ES A tiempo completo 30.000 € - 60.000 € al añoWe are looking for a Site Reliability Engineer (SRE) in the Server Site Reliability Engineer Team of StratoTasksAdminister and optimize Linux environments across production, test, and development.Manage key services: web, DNS, DHCP, proxy, backup, and monitoring.Implement automation with Ansible, Shell, Perl, and Python.Collaborate with development teams to...
-
Site Reliability Engineer
hace 1 semana
Barcelona, Barcelona, España Bright Purple A tiempo completo 60.000 € - 85.000 € al añoSite Reliability Engineer – BarcelonaJoin a leading global travel technology company that's transforming the way businesses manage travel.You will be working with cutting-edge platforms combined with world-class travel inventory with powerful management tools, delivering freedom for travellers and control for companies, saving time, money, and hassle for...
-
Senior Site Reliability Engineer
hace 1 semana
Barcelona, Barcelona, España Trust In SODA A tiempo completo 90.000 € - 120.000 € al añoSenior Site Reliability Engineer | AWS | SpainIf building and scaling reliable systems in AWS is where you thrive, this role is going to feel like the right kind of challenge.You'll be joining a fast-growing, late-stage tech company (Series E) that's trusted by some of the biggest global names across tech, finance, and media; the kind of clients whose...
-
Senior Site Reliability Engineer
hace 2 semanas
Barcelona, Barcelona, España inITium HR A tiempo completo 45.000 € - 90.000 € al añoWe are looking for a Senior Site Reliability Engineer for a market research company based in Barcelona.Their Platforms Team is a diverse and dynamic group focused on crafting and maintaining highperformance platforms, primarily leveraging the power of Amazon Web Services and Kubernetes. They're all about growth there – from dedicated Friday...
-
Site Reliability Engineer
hace 2 semanas
Barcelona, Barcelona, España Okta A tiempo completo 60.000 € - 120.000 € al añoGet to know OktaOkta is The World's Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth.At Okta, we celebrate a variety of...
-
Site Reliability Engineer
hace 1 semana
Barcelona, Barcelona, España Okta A tiempo completo 60.000 € - 120.000 € al añoGet to know OktaOkta is The World's Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth.At Okta, we celebrate a variety of...
-
Site Reliability Engineer
hace 1 semana
Barcelona, Barcelona, España Okta A tiempo completo 60.000 € - 120.000 € al añoGet to know Okta Okta is The World's Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth. At Okta, we celebrate a variety of...
-
Server Site Reliability Engineer
hace 1 semana
Barcelona, Barcelona, España Arsys A tiempo completo 60.000 € - 80.000 € al añoWe are looking for aSite Reliability Engineer (SRE)to strengthen our infrastructure and systems team. You will be responsible for maintaining and optimizing critical services such as web, DNS, proxy, backup, and monitoring, ensuring their stability, availability, and automation.You will work with an international team focused on continuous improvement,...