Service Reliability Engineer
hace 6 días
Who we are
Join the fintech revolution with Mambu, the leading SaaS cloud banking platform. We're on a mission to make banking better for a billion people. Explore exciting career opportunities and help shape the future of financial services. Learn more here.
About the team
At Mambu, our cloud-native banking platform is trusted by the world's most innovative financial institutions. As a Level 2 Support Engineer / SRE on the Support Engineering & Reliability Team (SERT), you will be the final technical escalation point for our most complex customer issues.
This role sits at the critical intersection of advanced technical support, software engineering, and production operations. You'll provide deep diagnostic expertise, owning the resolution of challenging customer cases and live production incidents end-to-end. We're looking for a customer-obsessed engineer who doesn't just resolve issues, but writes code to automate fixes, enhances observability, and drives permanent solutions back into the platform.
What you'll do
Own Customer Case Resolution End-to-End: Lead and resolve technically deep Level 2 support cases with full ownership—from initial triage and in-depth diagnosis to root cause analysis (RCA) and final resolution.
Diagnose problems across our distributed, cloud-native systems, including application, databases, APIs, and cloud services, using your code-level debugging skills to isolate and fix application-level defects.
Collaborate with Product Engineering teams to drive permanent bug fixes and ensure all customer-facing knowledge is captured, automated, and reused across the support organization.
Own Live Production Incidents: Lead technical investigations during high-severity production issues. You'll implement fixes that help resolve the incident and perform post-incident analysis to prevent recurrence.
Engineer the Support Experience through Automation and Observability: Treat operational pain points and repetitive support tasks as engineering challenges.
Use your programming and scripting skills (Python, Go, or Java/Bash) to automate diagnostics, build self-service tooling, and enhance support workflows, actively reducing operational toil.
Design, maintain, and evolve our observability stack (monitoring, alerting, and logging) to ensure our internal teams catch issues before customers do. Define and implement standards for monitoring API performance and system health.
What you'll bring
Expert-Level Troubleshooting and Debugging: Strong software engineering background with the proven ability to read, understand, and debug production code (e.g., in Python, Go, or Java) to identify application-level root causes.
Customer Case Handling Expertise: Extensive experience managing customer support cases, including technically deep troubleshooting, impact assessment, and a track record of driving permanent fixes with engineering teams.
Distributed Systems Diagnosis: Experience troubleshooting and tracing data flow through complex, distributed, cloud-native systems (application, message queues, APIs). You must be able to trace issues down to the code or configuration level.
Observability Stacks and Actionable Alerting: Hands-on experience defining actionable alerts, metrics, and dashboards using modern observability stacks (e.g., Prometheus, Grafana).
Data Investigation Proficiency: Strong SQL proficiency for investigation, performance analysis, and data validation during incident and case resolution.
A Support Engineering Mindset: A bias for action, a deep passion for automation, and the mindset of an engineer—you write code to solve operational challenges and reduce manual work.
Essential Platform Knowledge: Good understanding of CI/CD pipelines, containerization (Docker, Kubernetes), and infrastructure as code (Terraform) to effectively collaborate with platform teams.
Communication Under Pressure: Excellent communication skills, able to translate complex technical issues for both engineers and non-technical stakeholders during high-pressure incidents.
Nice to Have (or Grow Into)
Experience in software engineering, writing scalable, high throughput applications.
Knowledge of version control systems (GitLab).
Hands-on experience with Kubernetes internals.
Certification with one of the cloud providers (AWS, Google Cloud or Azure).
Previous experience in fintech or other highly regulated environments.
What you'll get
Join us to shape the future of banking, where your professional growth is equally as valued as your personal well-being.
Company equity for all
Learning and development opportunities
Hybrid/Remote working (location dependant)
30 day working abroad
4 week paid sabbatical after 5 years service
Additional benefits based on location
Let's connect
Follow Mambu on LinkedIn for the latest Fintech trends and success stories. Connect with us on Facebook, Instagram, and YouTube to experience our vibrant culture. Explore our mission, values, and the world we're building at Check out our Insights Hub for industry insights, Mambu blogs, webinars, and upcoming events.
As part of the recruitment (or HR onboarding) process, you will be required to obtain authorized criminal background and credit screening results, as well as be queried against a sanctions/anti-money-laundering/counter terrorism financing/politically exposed persons screening service and your employment is conditional upon approval of these results.
At Mambu, we encourage all interested candidates to apply, even if they don't meet every listed qualification, as we value diversity and recognize that experience doesn't always perfectly align with job descriptions. We are committed to providing equal opportunities for applicants with disabilities; if you need assistance during the application process, please contact
-
Application Reliability Engineer
hace 5 días
Barcelona, Barcelona, España GOLD AVENUE A tiempo completoJoin us as our new Application Reliability EngineerWe're looking for a pragmatic, detail-oriented Application Reliability Engineer with strong experience in Ruby on Rails and React , Expo) to strengthen our technology team.In this role, you'll act as the bridge between our hosting platforms and our development teams, ensuring that our applications remain...
-
Site Reliability Engineer
hace 5 días
Barcelona, Barcelona, España F. Hoffmann-La Roche Ltd A tiempo completoAt Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections, where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure...
-
Server Site Reliability Engineer
hace 2 semanas
Barcelona, Barcelona, España arsys ES A tiempo completoWe are looking for a Site Reliability Engineer (SRE) in the Server Site Reliability Engineer Team of StratoTasksAdminister and optimize Linux environments across production, test, and development.Manage key services: web, DNS, DHCP, proxy, backup, and monitoring.Implement automation with Ansible, Shell, Perl, and Python.Collaborate with development teams to...
-
Site reliability engineer
hace 2 semanas
Barcelona, Barcelona, España K2 Partnering Solutions A tiempo completoWe're hiringSite Reliability Engineer – Platform EngineeringBarcelona, Spain— Hybrid (2 days/week on-site)4+ years of experienceWe're looking for an SRE who's passionate about building scalable, secure and reliable platforms in a modern Kubernetes environment.What you'll do:• Design, build and maintain high-quality, scalable systems on Kubernetes•...
-
Site Reliability Engineer
hace 1 semana
Barcelona, Barcelona, España CrowdStrike A tiempo completoAs a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn't changed — we're here to stop breaches, and we've redefined modern security with the world's most advanced AI-native platform. We work on large scale distributed systems, processing almost 3...
-
Site Reliability Engineer
hace 6 días
Barcelona, Barcelona, España CrowdStrike A tiempo completoAs a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn't changed — we're here to stop breaches, and we've redefined modern security with the world's most advanced AI-native platform. We work on large scale distributed systems, processing almost 3...
-
Site Reliability Engineer
hace 2 semanas
Barcelona, Barcelona, España Okta A tiempo completoGet to know Okta Okta is The World's Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth. At Okta, we celebrate a variety of...
-
Senior Site Reliability Engineer
hace 2 semanas
Barcelona, Barcelona, España La French Tech Taiwan A tiempo completoOffres d'emploiLes SecteursIndustrieNumériqueSantéTransition écologiqueAgricultureRejoindre la Mission French TechDécouvrir les métiers de la TechSenior Site Reliability EngineerBarcelonaFull-TimeApply NowAboutSpendesk is Europe's leading AI-powered spend management and procurement platform that transforms company spending. By simplifying procurement,...
-
Server Site Reliability Engineer
hace 2 semanas
Barcelona, Barcelona, España Arsys A tiempo completoWe are looking for aSite Reliability Engineer (SRE)to strengthen our infrastructure and systems team. You will be responsible for maintaining and optimizing critical services such as web, DNS, proxy, backup, and monitoring, ensuring their stability, availability, and automation.You will work with an international team focused on continuous improvement,...
-
Senior Site Reliability Engineer
hace 2 semanas
Barcelona, Barcelona, España Okta A tiempo completoGet to know Okta Okta is The World's Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth. At Okta, we celebrate a variety of...