Senior Site Reliability Engineer

hace 7 días

En remoto, España Booming Games A tiempo completo

About the role

Join our team at Booming Games as a Site Reliability Engineer and ensure the peak performance and reliability of our systems across multiple geographical locations As a key player in troubleshooting and resolving complex issues, you will collaborate with engineers to drive automation, standardization, and optimization efforts. Your expertise in operating systems, networking, and distributed systems, combined with your passion for problem-solving, will make you an invaluable asset. If you are ready to revolutionize the reliability and scalability of our services while working with cutting-edge technologies, this role is perfect for you.

**Responsibilities**:

- Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes.
- Drive standardization efforts across multiple disciplines and services in conjunction with SREs throughout the organization.
- Identify and drive opportunities to improve automation for the company; scope and create automation for deployment, management and visibility of our services.
- Represent the SRE organization in design reviews and operational readiness exercises for new and existing services.
- Work with software engineers to improve upon deployment processes.
- Participate in the on-call rotation for production systems.
- 3rd line support in the networks and infrastructure team being the last line of defense in Engineering Support Escalation
- Manage the server and network infrastructure, assist in the development of security strategies and their implementation and participate in global network infrastructure upgrades with upstream providers
- Work with both SRE & Development teams on new projects and technologies such as: New Infrastracture Setup, Kubernetes Migrations, New Geographic
- Locations, Monitoring & Upgrades and more
- Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes.
- Promote openness, diversity of opinions and inclusive discussions at all times to evaluate a wide variety of ideas and perspectives in solving challenging problems
- Demonstrate clear decision making and good trade-offs in complex situations comprising multiple opinions, needs, teams, technologies, cloud providers, and architectural settings
- Communicate effectively with stakeholders ranging from executives to junior engineers across the breadth and depth of the engineering organization
- Enable the engineering organization to innovate and deliver with greater speed and safety
- Any other tasks or responsibilities that may be given in the due course of role.

**Requirements**:

- Sound fundamentals in operating systems, networking, and distributed systems.
- Exemplify high accountability, integrity, and resilience to maintain focus on both big-picture goals and milestones to get there
- Strong familiarity with Linux systems administration and management best practices.
- Familiarity with container technologies: Kubernetes, CRI, Docker, namespaces, cgroups.
- Strong understanding of: Ethernet, VLANs, IPv4/IPv6, ARP, DHCP, DNS, and TCP.
- Familiarity with distributed system problems: leader election, Raft consensus, etc.
- Expert level understanding with at least one public or private cloud technology such as Amazon AWS, Google GKE, or OpenStack.
- Practical knowledge of various aspects of service design, including messaging protocols and behavior, caching strategies and software design practices.
- Practical intermediate knowledge of shell scripting, some Ruby is a plus.
- Excellent knowledge of Linux/UNIX systems administration and performance tuning.
- Comfortable configuring DNS, DHCP, and LAN/WAN technologies.
- Minimum 5 years of managing services in an internet scale \*nix environment.
- Must be able to communicate well with technical as well as non-technical colleagues to achieve business goals.
- Must be adaptable and able to focus on the simplest, most efficient and reliable solutions.
- Track record of successful practical problem solving, excellent written and interpersonal communication in English, and documentation skills.
- Curiosity and an interest in networking, systems software, and distributed systems.
- Experience as a systems administrator or operations engineer.
- Experience with a 24/7 production environment.
- Experience with managed deployments providing software, platforms, or infrastructure as a service.
- Experience with SuperMicro server and storage gear is a plus.

Good to know
- We kindly ask for your understanding that we can only consider applicants within the the Central European Timezone +/-2
- To be considered for the role, we kindly ask that you submit your resume/CV in English
- This full-time position can be a permanent employment in Malta or on a freelance basis for contractors in the other countries

Why Work for Booming Game

Senior Site Reliability Engineer

hace 2 semanas

En remoto, España Novatec Software Engineering España SL A tiempo completo

About the job We are currently looking for a **Senior Site Reliability Engineer** to join our team based in Andalucia but not only, since we are open to remote applicants all over Spain. The Company Novatec Software Engineering España is a branch of Novatec Consulting GmbH, with headquarter in Stuttgart (Germany). We bring our passion for IT, agile software...
Site Reliability Engineer

hace 2 semanas

En remoto, España Novatec Software Engineering España SL A tiempo completo

About the job We are currently looking for a** Site Reliability Engineer** to join our team based in Andalucia but not only, since we are open to remote applicants all over Spain. The Company Novatec Software Engineering España is a branch of Novatec Consulting GmbH, with headquarter in Stuttgart (Germany). We bring our passion for IT, agile software...
Aws Site Reliability Engineer

hace 2 días

En remoto, España Business Insights A tiempo completo

**Descripción**: Desde Business Insights, buscamos dos perfiles AWS** **Site Reliability Engineer para participar en un proyecto interesante. Modalidad: híbrida o 100% teletrabajo Ubicación: Aragón, preferentemente Zaragoza **Requisitos**: **_Skills:_** - _ >2 years of experience in SRE Engineering roles in AWS_ - _ Experience in AWS public cloud...
Senior Site Reliability Engineer

hace 2 semanas

En remoto, España Knack.com A tiempo completo

Senior Site Reliability Engineer - Spain Remote- Hi, thanks for reading about our - **Senior Site Reliability Engineer** opportunity! We're glad you're here. - We're Knack, a code-free platform used by thousands of customers — from non-profits to the world’s biggest companies — to easily build custom apps, workflows, and databases. - We’re looking...
Senior Site Reliability Engineer

hace 6 días

En remoto, España Grafana Labs A tiempo completo

**Senior SRE - Databases**: **About the role**: We are looking for a Senior SRE to help us support our highest value Grafana Cloud customers by increasing the reliability of our Cloud databases that are based on Mimir, Loki, Tempo, and Pyroscope. We provide these databases as a SaaS product from AWS, GCP, and Azure across all regions. The High SLA SRE team...
Senior Site Reliability Engineer

hace 4 días

En remoto, España Akamai A tiempo completo

**Do you enjoy collaborating with teams to solve complex challenges?** **Do you have a passion for cutting edge technologies and tackling system problems?** **Join our highly skilled Storage team** **Partner with the best** You'll collaborate with operations and development teams to build and manage our scalable storage platforms. You'll create tooling...
Site Reliability Engineer

hace 6 días

En remoto, España Fortexpro A tiempo completo

We are looking for SRE to work on a major international project. 100% remote work. Offer addressed to workers from any EEC country. Tasks - Implements Site Reliability Engineering and/or DevOPS practices. - Manages technology, infrastructure and software development projects in accordance with SRE and/or DevOPS principles. - Empowers development teams...
Site Reliability Engineer

hace 2 días

En remoto, España White Hat Gaming A tiempo completo

**About White Hat Gaming** Founded in 2012, White Hat Gaming (WHG) is an online casino technology and services company with offices in Malta, London, Gibraltar, Chicago, and Cape Town. With a global team of over 600 specialists, we provide market-leading content, including Kambi Sportsbook and over 100 leading games providers. We promote and foster a...
Site Reliability Engineer

hace 2 semanas

En remoto, España Semrush A tiempo completo

Hi there! We are Semrush, a global IT company developing our own product - a platform for digital marketers. New stars are born here, so don’t miss your chance. This is our role **Site Reliability Engineer** for those who want to turn ideas into reality using code, algorithms, and maybe a bit of magic. **Tasks in the role**: - Read and write code in...
Site Reliability Engineer

hace 1 semana

En remoto, España Red Hat, Inc. A tiempo completo

The Red Hat - Site Reliability Engineering (SRE) team is looking for Software Engineer to join us. In this role, you will develop, scale, and operate our - OpenShift managed cloud services - OpenShift is Red Hat’s enterprise Kubernetes distribution. As an SRE you will contribute to running OpenShift at scale by enabling customer self-service, making our...

América

Europa

Asia / Oceanía

África

Senior Site Reliability Engineer