Lead(Staff) Site Reliability Engineer, Resiliency

hace 1 semana


Madrid, España Shopify A tiempo completo

Company Description

Shopify is a leading global commerce company, providing trusted tools to start, grow, market, and manage a retail business of any size. Shopify makes commerce better for everyone with a platform and services that are engineered for reliability, while delivering a better shopping experience for consumers everywhere. Shopify powers millions of businesses in more than 175 countries and is trusted by brands such as Allbirds, Gymshark, PepsiCo, Staples, and many more.

**Job Description**:
The Resiliency team is part of the Production Engineering organization that builds, operates, and improves the heart of Shopify’s technical platform, and unlock the power of planet-scale infrastructure for all of Shopify’s merchants, buyers, and developers.

Our job is to get to a resolution as quickly as possible, and guide teams to build a more resilient Shopify. We build whatever is necessary to quickly resolve incidents, and seek out ways to automate away the manual toil.

Commerce happens 24/7, and we are building out a globally distributed team that can respond whenever necessary. Our team hires across 4 different regions (APAC, North America West, North America East, and EMEA) in a follow-the-sun support model that also provides 24/7 coverage for incident management.

**What we can offer you**:

- The opportunity to run Shopify’s planet scale systems by enabling engineering teams to create resilient systems.
- Work focusing on a unique set of interesting and challenging problems that can’t be easily found elsewhere.
- The flexibility to define what Resiliency and Site Reliability Engineering mean for Shopify.
The means to grow the capacity of our worldwide distributed site reliability engineering teams, and consult with other engineering groups on how to build low latency, highly resilient systems.
- A direct impact on our millions of merchants’ ability to generate revenue for their livelihood, families, and employees through the business they’ve built from the ground up on our platform.
- Potential relocation assistance to one of the regions the team operates in.

**You’ll work on things like**:

- Collaborating with high-calibre engineering teams across Shopify to help them create resilient systems.
- Acting as a force multiplier across and within engineering departments.
- Managing ongoing incidents, using your understanding of Shopify to involve the right teams and resolve as quickly as possible.
- Cleaning up the noise in our signals, ensuring we can get an understanding of the system and debug a problem easily.
- Responding to automated alerts and execute playbooks.
- Setting standards with teams for building resilient, debuggable systems.
- Ensuring we never fail for the same reason twice.
- Following up on each meaningful incident to ensure the appropriate learnings are extracted and teams know what to do next.
- Helping teams build tools to automate the toil of on-call duties.

**Qualifications**:
**Qualities you likely have to be well suited to this role**:

- Experience handling multiple on-call shifts for mission-critical systems, and responsibility for the tools and processes used to debug and correct failures.
- You've navigated more than one incident through to the retrospective process.
- You know what good observability looks like, but more importantly, how to get there.
- Strong software engineering skills, primarily in backend software development.
- Comfort with hands-on development, navigating through multiple programming languages, digging deep in the stack, and using cloud infrastructure (AWS, GCE, Azure, Kubernetes, Docker).
- Experience with mentorship and helping teammates level up their craft and technical skills.
- You understand the meaning of continuous improvement and evolving systems.
- You reject the idea that on call has to be a terrible, disruptive experience.
- You understand how to improve difficult situations through short and iterative projects.
- A commitment and drive for quality, technical excellence and results.

**Bonus Points**:

- Experience working with a variety of open-source software, including nginx, redis, Memcached and MySQL.
- Familiarity with network and web protocols, from IP to HTTP.

Additional Information- At Shopify, we understand that experience comes in many forms. We’re dedicated to adding new perspectives to the team - so if your experience is this close to what we’re looking for, please consider applying._

LI-REMOTE
- At Shopify, we understand that experience comes in many forms. We’re dedicated to adding new perspectives to the team - so if your experience is this close to what we’re looking for, please consider applying._



  • Madrid, España Spectrum Search A tiempo completo

    ️ Site Reliability Engineering Leadership — Institutional Blockchain InfrastructureRemote | Europe preferredSpectrum Search is partnering with an instantly recognisable, enterprise-grade blockchain network building mission-critical infrastructure for global financial systems.Our client operates at the intersection of blockchain, distributed systems, and...

  • Site Reliability Engineer

    hace 3 semanas


    Madrid, España Switch Tech Talent A tiempo completo

    Role: Site Reliability Engineer Location:Barcelona/Hybrid (3 days a week in office) Salary:up to €85,000 per annum Key Skills:AWS, IaC, Docker, ScriptingAs a Site Reliability Engineer you will be at the forefront of maintaining robust, scalable, and secure cloud solutions that power this cutting-edge e-commerce platform. Your expertise will ensure...


  • madrid, España Circle A tiempo completo

    Join to apply for the Senior Site Reliability Engineer role at Circle Circle is a financial technology company at the epicenter of the emerging internet of money, where value can finally travel like other digital data — globally, nearly instantly and less expensively than legacy settlement systems. This ground‑breaking new internet layer opens up...


  • Madrid, España Circle A tiempo completo

    Join to apply for the Senior Site Reliability Engineer role at Circle Circle is a financial technology company at the epicenter of the emerging internet of money, where value can finally travel like other digital data — globally, nearly instantly and less expensively than legacy settlement systems. This ground‐breaking new internet layer opens up...


  • Madrid, España JR Spain A tiempo completo

    Social network you want to login/join with:Es posible que un gran número de candidatos se presenten a este puesto, así que asegúrese de enviar su CV y su solicitud lo antes posible.Who We AreAt Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. We are committed to creating a more...


  • Madrid, España Circle A tiempo completo

    Join to apply for the Senior Site Reliability Engineer role at Circle Circle is a financial technology company at the epicenter of the emerging internet of money, where value can finally travel like other digital data — globally, nearly instantly and less expensively than legacy settlement systems. This ground‑breaking new internet layer opens up...


  • Madrid, España MoonPay A tiempo completo

    Overview MoonPay Madrid, Community of Madrid, SpainSenior Site Reliability Engineer MoonPay Madrid, Community of Madrid, SpainAbout the Opportunity Site Reliability Engineering at MoonPay is responsible for providing a resilient, secure, production-ready platform that enables MoonPay to safely deploy applications and services in a self-serve, repeatable...


  • Madrid, España Trust In SODA A tiempo completo

    Senior Site Reliability Engineer | Spain (Hybrid)An opportunity to join a high growth, late stage technology company operating at significant scale. The business supports thousands of customers globally and is investing heavily in reliability, platform maturity and engineering quality as it continues to grow.This is a true senior SRE role for someone who has...


  • Madrid, España Matillion A tiempo completo

    Matillion is The Data Productivity Cloud. We are on a mission to power the data productivity of our customers and the world, by helping teams get data business ready, faster. Our technology allows customers to load, transform, sync and orchestrate their data. We are looking for passionate, high-integrity individuals to help us scale up our growing...


  • Madrid, España Kyndryl A tiempo completo

    Decline will set your Cookie preferences to "Required" and will prevent Kyndryl and its partners from collecting and using Cookie data to collect statistics and to provide you a personalized web experience and more relevant ads on third party websites.**Who We Are****The Role**Join us as a Site Reliability Engineer (SRE) and embark on an exciting journey of...