Lead Site Reliability Engineer

hace 7 días


Milano, España PRAGMATIKE A tiempo completo

Job Description Location : Fully remote EU timezone (CET ±2h) Start date : ASAP Languages : Fluent English is mandatory Industry : Cloud Computing We are hiring at Pragmatike to expand our team and drive the growth of our internal projects. Our focus is on developing cutting‑edge solutions in Cloud Computing, while fostering a culture of collaboration and innovation. Joining us means being part of a passionate team where your ideas and skills directly contribute to shaping tomorrow's technologies. If you're excited about working on ambitious projects in a dynamic and flexible environment, we'd love to hear from you Responsibilities Operate and maintain Linux-based infrastructure (Debian / Ubuntu). Deploy, manage, and scale Kubernetes clusters across bare‑metal, virtualized, and on‑prem environments. Oversee full cluster lifecycle : upgrades, node pools, networking, storage, and security hardening. Implement automation for provisioning and operations using Ansible, Bash / Python, and GitOps workflows. Design and maintain networking architecture including VLANs, L2 / L3 routing, VPNs, and multi‑site connectivity. Build automated deployment workflows (PXE boot, Preseed, cloud‑init). Deploy and maintain observability stacks (Prometheus / Grafana, Loki, ELK, Graylog). Lead incident response and escalation activities across the platform. Improve system availability and reduce latency at all levels. Define and implement SLOs / SLIs at multiple infrastructure levels (physical network / hardware, platform virtualization, software services). Optimize alerting and monitoring pipelines to provide actionable insights. Establish and maintain on‑call schedules to ensure coverage across timezones. Develop Standard Operating Procedures (SOPs) for repeatable operations and maintenance tasks. Coordinate physical maintenance for Policlouds (periodic maintenance, hardware issues, DC‑Ops). Manage virtualization and orchestration layers (OpenStack, Proxmox, VMware). Help develop and maintain overall architecture across all products. Plan resources for future initiatives, accounting for demand and growth projections. Work with development teams to improve overall quality and optimize resource utilization. Collaborate with cross‑functional stakeholders (Hivenet, Policloud, Customer Success teams). Requirements Expert-level, hands‑on experience operating Kubernetes in production environments. Strong network engineering skills (VLANs, L2 / L3 routing, VPNs, multi‑site connectivity) - this is essential for the role. Strong proficiency with Linux systems administration (Debian / Ubuntu). Solid understanding of networking fundamentals and ability to design complex network architectures. Experience building and maintaining automation workflows (Ansible, Bash / Python, Git‑based). Experience with observability stacks such as Prometheus, Grafana, ELK, Loki, or Graylog. Background with virtualization technologies (OpenStack, Proxmox, VMware). Experience with bare‑metal provisioning and MAAS (Metal as a Service). Strong understanding of distributed systems and container orchestration. Process‑oriented mindset with ability to develop SOPs and operational procedures from scratch. Experience with incident response, escalation procedures, and on‑call rotations. Ability to work autonomously in a fast‑paced, engineering‑driven environment. Strong technical skills combined with alignment to team values. Nice To Have Experience with service mesh (Istio, Linkerd) or advanced CNI implementations. Knowledge of Cloudflare APIs, DNS automation, or tunnel configurations. Experience with GPU infrastructure, node preparation, or resource scheduling. Familiarity with security best practices (RBAC, firewalls, network policies). Exposure to IT asset management or license tracking workflows. Experience working in multi‑timezone environments and coordinating across distributed teams. Background establishing reliability practices and SRE frameworks in growing organizations. Why Join Us : 100% remote work with flexible hours High‑impact role with autonomy and ownership Collaborative and international engineering team Cutting‑edge tech stack with strong focus on reliability and automation. #J-18808-Ljbffr



  • Milano, España Moltiply Group A tiempo completo

    Una società di tecnologia con sede a Milano cerca un Site Reliability Engineer esperto per gestire e automatizzare l'infrastruttura IT. I candidati devono avere esperienza in strumenti di automazione come Ansible e container orchestration come Kubernetes. La posizione prevede modalità di lavoro ibrida, combinando smart working e presenza in ufficio. Si...


  • Milano, España Canonical A tiempo completo

    A pioneering tech firm is looking for a Site Reliability / Gitops Engineer to enhance automation and cloud operations. This role requires an enthusiast for Linux who can develop infrastructure as code and maintain core services across global teams. Ideal candidates will have a strong engineering background, experience in software development and Linux...


  • Milano, España Moltiply Group A tiempo completo

    In Moltiply affrontiamo e trasformiamo i processi più complessi dei nostri clienti -dal customer care alla digitalizzazione -unendo tecnologie avanzate e il talento di oltre 3.500 professionisti in Italia e nel mondo. La nostra missione è aiutare le aziende a moltiplicare il proprio valore, ridisegnando e semplificando modelli operativi con l’obiettivo...


  • Milano, España Jobbit A tiempo completo

    Responsibilities & Qualifications Provide day-to-day operational support for production environments, ensuring high availability and reliability of critical services. Develop, maintain and enhance automation scripts and tools using Bash, Python and Ansible to streamline operational tasks and incident response. Monitor system performance, proactively identify...

  • Site Reliability

    hace 7 días


    Milano, España Canonical A tiempo completo

    Canonical is a leading provider of open‑source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our customers include the world’s leading public cloud and silicon providers, and...


  • Milano, España Generali Italia A tiempo completo

    A major global insurance player is seeking a Service Reliability Engineer & Application Maintenance Specialist to optimize their cloud platforms' reliability, scalability, and cost. This role involves working closely with the devOPS team to define recovery automations and ensure compliance with service SLAs. Candidates must have a degree in Computer Science,...


  • Milano, España Generali Italia A tiempo completo

    Job Description We are looking for a Service Reliability Engineer & Application Maintenance Specialist to ensure the reliability, scalability, and cost optimization of our cloud platforms. The ideal candidate will have strong experience in automation, proactive monitoring, and performance management, with a mindset focused on continuous improvement. Key...


  • Milano, España ALDEBARAN Group A tiempo completo

    A prominent energy and infrastructure company is looking for a Site Field Engineer to coordinate activities for a major decarbonization project in the Netherlands. The ideal candidate will have an engineering degree and 3 to 5 years of relevant experience. Responsibilities include overseeing field engineering, ensuring compliance with local regulations, and...


  • Milano, España ALDEBARAN Group A tiempo completo

    Site Field Engineer – Ref. JOB-1505 Do you want to actively contribute to a major industrial project and gain solid on-site experience on a strategic decarbonization initiative? We are looking for a Junior Site Project Engineer / Junior Field Engineer to support engineering, construction, and permitting activities on a large-scale industrial project in the...


  • Milano, España ALDEBARAN Group A tiempo completo

    Site Field Engineer – Ref. JOB-1505 Do you want to actively contribute to a major industrial project and gain solid on‑site experience on a strategic decarbonization initiative? We are looking for a Junior Site Project Engineer / Junior Field Engineer to support engineering, construction, and permitting activities on a large‑scale industrial project in...