Lead Sre Engineer

hace 4 días


En remoto, España Stuart A tiempo completo

Stuart (DPD Group) is a sustainable last-mile logistics company that connects retailers and e-merchants to a fleet of geolocalised couriers across several countries in Europe.

Our Mission
- We are an impact-driven company that aims to build the future of logistics for a more sustainable world: shared, efficient and reliable. We are committed to creating a new standard for urban deliveries that meet today’s environmental and social challenges while offering a premium delivery experience blending speed, flexibility and convenience.

Our motto: “Make every delivery a moment all of us can truly celebrate” More than 3000+ leading brands already partner with us across Restaurants, Grocery, Retail & Luxury, eCommerce and Professional Services to deliver all types of goods at the tap of a button. Stuart is a highly diverse and inclusive company of 700+ employees with 90+ nationalities working across France, Italy, Poland, Portugal, Spain and the U.K.

It’s the right moment and the right place for us to make an impact on millions of people, as home delivery services hit a record high. And guess what? You can help us fulfil our vision

We are looking for a
**Lead Site Reliability Engineer** who will be a technical leader for our SRE team. You will guide the team technically and help us make our platform more robust, handle failures gracefully, and early detect issues by the mean of automation, proper alarming, and chaos engineering.

**The SRE mission **is to make the platform as reliable as possible, trying to reduce the number and severity of incidents affecting the platform. We need to make sure that all the services are efficiently monitored with the right thresholds set for alarms to be meaningful, and that most of the remediation work is automated rather than manual. Further reliability of the platform is provided by introducing controlled errors in it (chaos engineering principles) and testing different disaster recovery scenarios. SREs are the stewards of reliability and they provide the technical and documentation instruments for other Engineering teams to build reliable software.

**The SRE team** is a new team at Stuart and you will have the opportunity to see how the team grows further, and have a word in how it does it. You will be part of the Infrastructure department under the Reliability area, together with the Engineering Support team. Other areas of the department are Cloud Engineering, Security, and IT.

**What will I be doing?**:

- Be a technical leader for the team and the go to person for software reliability matters.
- Take part in additional departmental efforts such as hiring, running community talks, defining team processes and other such ways to contribute to culture and growth on the team.
- Help the other engineering teams to build reliable, observable, and performant products.
- Drive and help other teams to set SLOs and SLAs and track them via SLIs.
- Lead Design the Stuart observability stack, implement it and guide other teams to adopt it.
- Contribute to Stuart systems reliability and performance.
- Write playbooks for alarms, and then automate them so manual intervention is not required.
- Document knowledge and practices in a clear way, so other departments can benefit from it.
- Collaborate with the Engineering Support team on incident management.
- Conduct and lead post-mortem meetings; follow-up on the action items.
- Lead the way towards the chaos engineering path.

**What do we need from you?**:

- 5+ years of experience in a similar position (even if with a different title) in an always-up, always-available mission-critical service.
- You come from a Systems or a Software Engineering background, we will like you exactly the same
- Love for automation: you don’t want to repeat the same job twice.
- Proven record leading complex projects from start to end.
- You are the go-to person in your team if there are difficult technical problems to solve.
- You have written programs to automate tasks, reducing toll.
- You feel comfortable doing low-level Linux and networking debugging.
- Worked with complex Terraform code-bases. Bonus point if you wrote a provider.
- Very good cloud environments and Kubernetes knowledge (we use AWS & EKS).
- Working experience with chaos engineering practices.
- You like teaching and pass best-practices to others, and write thorough documentation.
- Proactive mindset: if you see something is not working, you start the process to fix it.
- Both written and spoken fluency in English.

Don’t worry, we don’t expect you to tick every single item here But it should give you a feeling of what kind of experience we are looking for.**The stuff you wanna know**:

- Family-friendly work-life balance - work from home and flexible hours
- Option to work remotely anywhere in Spain
- Ticket Restaurant by Edenred (€11 daily)
- Unlimited access to Udemy for all your learning and development needs
- Stuart Academy with regular workshops, Stu-Classes,


  • Site Reliability Engineer

    hace 2 semanas


    En remoto, España Ethikos A tiempo completo

    Our client is looking for a ️ SRE ️ for interesting projects with an important leading company. If you want to know more, read on! What will your main functions be? - Work on an -ever evolving architecture of IptiQ cloud infrastructure - Spanning multiple geographical regions by designing and implementing requirements - Automate away issues and Shift...

  • Site Reliability Engineer

    hace 2 semanas


    En remoto, España audiense A tiempo completo

    **Engineering culture**: Ship early - and often Only one project - at a time Testing is a first - class problem - ️ Always be recruiting Communicate openly and frequently Audiense is an equal opportunity employer, and we know it's our differences that makes us great, so we want to welcome people from all backgrounds to our family. We encourage black,...


  • En remoto, España Datalogics A tiempo completo

    **Lead Data Scraping Engineer** - **Salary up to 9500 EUR per month**: - **100% remote work (1 meeting per month in Portugal)**: - **Full time**: - **Contract: CoE/EOR/B2B** Our client, with headquarters in the EU, specializes in providing highly accessible and carefully curated data covering a wide range of products. With enormous amounts of retail data...


  • En remoto, España zb.io A tiempo completo

    2+ years relevant industry experience in SRE, Cloud Engineering or DevOps roles Considerable experience with Linux systems administration (Ubuntu experience appreciated) - Experience with AWS and cloud architectures/services. - Familiarity with the container and container orchestration space (Docker, Kubernetes, etc.) - Experience working with...

  • Senior SRE Lead

    hace 2 semanas


    remoto, España Spectrum Search A tiempo completo

    A leading blockchain technology recruitment firm is seeking a Site Reliability Engineering Leader to oversee reliability and operational excellence for a high-throughput infrastructure. This hands-on role demands 7+ years of SRE experience, particularly with distributed systems. The ideal candidate will thrive in a remote-first culture while overseeing...


  • En remoto, España Novatec Software Engineering España SL A tiempo completo

    We are currently looking for a** Cloud Platform Engineer** to join our team based in Andalucia but not only, since we are open to remote applicants all over Spain. **The Company** Novatec Software Engineering España is a branch of Novatec Consulting GmbH, with headquarter in Stuttgart (Germany). We bring our passion for IT, agile software development and...


  • En remoto, España Grafana Labs A tiempo completo

    **Senior SRE - Databases**: **About the role**: We are looking for a Senior SRE to help us support our highest value Grafana Cloud customers by increasing the reliability of our Cloud databases that are based on Mimir, Loki, Tempo, and Pyroscope. We provide these databases as a SaaS product from AWS, GCP, and Azure across all regions. The High SLA SRE team...

  • Platform Engineer

    hace 4 días


    En remoto, España Epos Now A tiempo completo

    **Platform Engineer** **100% Remote** **€36,000 - €45,000 We are a market-leading retail and hospitality software business with a growing international presence. We operate within the payments and POS space, enabling businesses in over 70 countries to grow and thrive. Due to our continued growth and investment, we are looking for talented Platform...

  • Site Reliability Engineer

    hace 2 semanas


    En remoto, España Fortexpro A tiempo completo

    We are looking for SRE to work on a major international project. 100% remote work. Offer addressed to workers from any EEC country. Tasks - Implements Site Reliability Engineering and/or DevOPS practices. - Manages technology, infrastructure and software development projects in accordance with SRE and/or DevOPS principles. - Empowers development teams...


  • En remoto, España Landbot A tiempo completo

    **About Landbot** Operating in more than 40 countries, **Landbot** _(the most powerful No-Code Chatbot Builder)_ offers a platform that helps companies to create unbeatable chatbot conversations in different channels: Web, WhatsApp, and Messenger. With us, you will be working in a team of engineers, designers, PMs. A team with diverse and exciting...