Senior Site Reliability Engineer
hace 2 semanas
About the role
Join our team at Booming Games as a Site Reliability Engineer and ensure the peak performance and reliability of our systems across multiple geographical locations As a key player in troubleshooting and resolving complex issues, you will collaborate with engineers to drive automation, standardization, and optimization efforts. Your expertise in operating systems, networking, and distributed systems, combined with your passion for problem-solving, will make you an invaluable asset. If you are ready to revolutionize the reliability and scalability of our services while working with cutting-edge technologies, this role is perfect for you.
**Responsibilities**:
- Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes.
- Drive standardization efforts across multiple disciplines and services in conjunction with SREs throughout the organization.
- Identify and drive opportunities to improve automation for the company; scope and create automation for deployment, management and visibility of our services.
- Represent the SRE organization in design reviews and operational readiness exercises for new and existing services.
- Work with software engineers to improve upon deployment processes.
- Participate in the on-call rotation for production systems.
- 3rd line support in the networks and infrastructure team being the last line of defense in Engineering Support Escalation
- Manage the server and network infrastructure, assist in the development of security strategies and their implementation and participate in global network infrastructure upgrades with upstream providers
- Work with both SRE & Development teams on new projects and technologies such as: New Infrastracture Setup, Kubernetes Migrations, New Geographic
- Locations, Monitoring & Upgrades and more
- Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes.
- Promote openness, diversity of opinions and inclusive discussions at all times to evaluate a wide variety of ideas and perspectives in solving challenging problems
- Demonstrate clear decision making and good trade-offs in complex situations comprising multiple opinions, needs, teams, technologies, cloud providers, and architectural settings
- Communicate effectively with stakeholders ranging from executives to junior engineers across the breadth and depth of the engineering organization
- Enable the engineering organization to innovate and deliver with greater speed and safety
- Any other tasks or responsibilities that may be given in the due course of role.
**Requirements**:
- Sound fundamentals in operating systems, networking, and distributed systems.
- Exemplify high accountability, integrity, and resilience to maintain focus on both big-picture goals and milestones to get there
- Strong familiarity with Linux systems administration and management best practices.
- Familiarity with container technologies: Kubernetes, CRI, Docker, namespaces, cgroups.
- Strong understanding of: Ethernet, VLANs, IPv4/IPv6, ARP, DHCP, DNS, and TCP.
- Familiarity with distributed system problems: leader election, Raft consensus, etc.
- Expert level understanding with at least one public or private cloud technology such as Amazon AWS, Google GKE, or OpenStack.
- Practical knowledge of various aspects of service design, including messaging protocols and behavior, caching strategies and software design practices.
- Practical intermediate knowledge of shell scripting, some Ruby is a plus.
- Excellent knowledge of Linux/UNIX systems administration and performance tuning.
- Comfortable configuring DNS, DHCP, and LAN/WAN technologies.
- Minimum 5 years of managing services in an internet scale \*nix environment.
- Must be able to communicate well with technical as well as non-technical colleagues to achieve business goals.
- Must be adaptable and able to focus on the simplest, most efficient and reliable solutions.
- Track record of successful practical problem solving, excellent written and interpersonal communication in English, and documentation skills.
- Curiosity and an interest in networking, systems software, and distributed systems.
- Experience as a systems administrator or operations engineer.
- Experience with a 24/7 production environment.
- Experience with managed deployments providing software, platforms, or infrastructure as a service.
- Experience with SuperMicro server and storage gear is a plus.
Good to know
- We kindly ask for your understanding that we can only consider applicants within the the Central European Timezone +/-2
- To be considered for the role, we kindly ask that you submit your resume/CV in English
- This full-time position can be a permanent employment in Malta or on a freelance basis for contractors in the other countries
Why Work for Booming Game
-
Site Reliability Engineer
hace 3 días
En remoto, España Landbot A tiempo completo**About Landbot** Operating in more than 40 countries, **Landbot** _(the most powerful No-Code Chatbot Builder)_ offers a platform that helps companies to create unbeatable chatbot conversations in different channels: Web, WhatsApp, and Messenger. With us, you will be working in a team of engineers, designers, PMs. A team with diverse and exciting...
-
Aws Site Reliability Engineer
hace 1 semana
En remoto, España Business Insights A tiempo completo**Descripción**: Desde Business Insights, buscamos dos perfiles AWS** **Site Reliability Engineer para participar en un proyecto interesante. Modalidad: híbrida o 100% teletrabajo Ubicación: Aragón, preferentemente Zaragoza **Requisitos**: **_Skills:_** - _ >2 years of experience in SRE Engineering roles in AWS_ - _ Experience in AWS public cloud...
-
Senior Site Reliability Engineer
hace 2 semanas
En remoto, España Wizeline A tiempo completo**The Company** Wizeline is a software development and design services company with operations in the U.S., Mexico, Vietnam, Thailand, Australia, and Spain. High-growth companies need engineering capacity to scale. Wizeline brings depth in product design, technical writing, project management, and across engineering disciplines. **Our People** At...
-
Senior Site Reliability Engineer
hace 2 semanas
En remoto, España Grafana Labs A tiempo completo**Senior SRE - Databases**: **About the role**: We are looking for a Senior SRE to help us support our highest value Grafana Cloud customers by increasing the reliability of our Cloud databases that are based on Mimir, Loki, Tempo, and Pyroscope. We provide these databases as a SaaS product from AWS, GCP, and Azure across all regions. The High SLA SRE team...
-
Senior Site Reliability Engineer
hace 3 días
En remoto, España Grafana Labs A tiempo completo**Senior SRE - Databases**: **About the role**: We are looking for a Senior SRE to help us support our highest value Grafana Cloud customers by increasing the reliability of our Cloud databases that are based on Mimir, Loki, Tempo, and Pyroscope. We provide these databases as a SaaS product from AWS, GCP, and Azure across all regions. The High SLA SRE team...
-
Site Reliability Engineer
hace 3 días
En remoto, España Novatec Software Engineering España SL A tiempo completoAbout the job We are currently looking for a** Site Reliability Engineer with experience in Databases** to join our team based in Andalucia but not only, since we are open to remote applicants all over Spain. The Company Novatec Software Engineering España is a branch of Novatec Consulting GmbH, with headquarter in Stuttgart (Germany). We bring our passion...
-
Senior Site Reliability Engineer
hace 1 semana
En remoto, España Akamai A tiempo completo**Do you enjoy collaborating with teams to solve complex challenges?** **Do you have a passion for cutting edge technologies and tackling system problems?** **Join our highly skilled Storage team** **Partner with the best** You'll collaborate with operations and development teams to build and manage our scalable storage platforms. You'll create tooling...
-
Site Reliability Engineer
hace 2 semanas
En remoto, España Fortexpro A tiempo completoWe are looking for SRE to work on a major international project. 100% remote work. Offer addressed to workers from any EEC country. Tasks - Implements Site Reliability Engineering and/or DevOPS practices. - Manages technology, infrastructure and software development projects in accordance with SRE and/or DevOPS principles. - Empowers development teams...
-
Senior Site Reliability Engineer
hace 1 día
En remoto, España redhat A tiempo completo**About the job**: You will get an opportunity to collaborate with diverse agile teams around the world to deliver value for our customers and partners in an open source way. This is also a great opportunity to hone your skills while working with a wide range of modern languages, frameworks, and technologies. As a Senior Site Reliability Engineer, you will...
-
Site Reliability Engineer
hace 1 semana
En remoto, España White Hat Gaming A tiempo completo**About White Hat Gaming** Founded in 2012, White Hat Gaming (WHG) is an online casino technology and services company with offices in Malta, London, Gibraltar, Chicago, and Cape Town. With a global team of over 600 specialists, we provide market-leading content, including Kambi Sportsbook and over 100 leading games providers. We promote and foster a...