Lead Site Reliability Engineer (d/f/m)
Personio's intelligent HR platform helps small and medium-sized organizations unlock the power of people by making complicated, time-consuming tasks simple and efficient. Our team of 1,500 Personios is building user-friendly products that delight our 15,000+ customers and their 1.5 million employees. Ready to make an impact from day one?
The Role
This role requires 2 days a week in our Munich, Berlin or Dublin office.
Join us to shape the future of software in the underserved and high-impact HR technology industry. Your work will have a direct and tangible impact on customers, offering ownership and the chance to make a meaningful difference. As we prepare for significant growth, you'll face exciting challenges and have the opportunity to influence our path toward becoming one of the world's leading tech companies.
Personio is seeking an experienced Engineer to design, build, operate, monitor and scale our infrastructure through automated solutions. You’ll empower engineering teams by sharing cloud platform expertise, developing tools and establishing company wide mechanisms to ensure reliability, scalability and uptime. Our ideal candidate combines strong technical expertise with a collaborative mindset, working closely with other engineering teams to build, scale and enhance their applications on our platform.
What You’ll Do
Engage in and improve the full service lifecycle from initial design through deployment, operation, and continuous improvement.
Prepare services for production by taking part in system design reviews, developing shared frameworks and platforms, planning capacity and conducting launch assessments.
Operate, monitor, and maintain live services, designing observability stacks and dashboards to track key metrics and improve operational insight.
Ensure sustainable scalability through automation, actively contributing to continuous improvement for reliability and delivery speed.
Collaborate with product and engineering teams to define SLOs, error budgets and ensure services are reliable, scalable and observable.
Support incident management processes, including on-call rotations, assisting with outage response, and contributing to post-mortems and root cause analysis.
Identify and reduce toil through process automation, creating playbooks and automated runbooks to reduce MTTR.
Support resilience strategies and help implement chaos testing to proactively uncover weaknesses and validate recovery strategies.
Own and maintain the reliability of our event streaming and Change Data Capture (CDC) stack.
Mentor and train peers on reliability best practices and tooling, contributing to community growth.
What You Need To Succeed
Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
6+ years of experience with SaaS software development in distributed systems using languages such as Kotlin/Java, Typescript, Python, and technologies like IaC, Docker, and Kubernetes.
2+ years’ experience as an SRE or similar role designing, operating, analyzing and troubleshooting distributed systems in agile environments.
Act as a Datadog subject matter expert, assisting with observability stack design, dashboard creation, and training peers in best practices.
Hands-on experience running Kafka at scale including configuration, operational failure modes and reliable recovery/runbooks.
Systematic problem solving and debugging skills with a strong sense of ownership and bias towards establishing mechanisms which can scale across the entire company.
Excellent written, verbal, and documentation skills.
Collaborative team player, able to communicate effectively across disciplines.
Nice to Have/Bonus:
Experience with CI/CD tooling (GitHub Actions/GitOps tools)
Experience tuning JVM-based services and Node.js runtimes
Experience with AWS MSK Connect
Why Personio
Personio is an equal opportunities employer, committed to building an integrative culture where everyone feels welcomed and supported. We embrace uniqueness and understand that our diverse, values-driven culture makes us stronger. We are proud to have an inclusive workplace environment that will foster your development no matter your gender, civil status, family status, sexual orientation, religion, age, disability, education level, or race.
At Personio, we value in-person collaboration while also offering flexibility. This role is office-based, with 2 required in your contracted office location. The remaining days can be worked from home or in the office if you prefer. In addition, you’ll have 20 Flex Days per year to work remotely from other locations.
Aside from our people, culture, and mission, check out some of the other benefits that make Personio a great place to work:
Receive a competitive reward package – reevaluated each year – that includes salary, benefits, and pre-IPO equity.
Enjoy 28 days of paid vacation, plus an additional day after 2 and 4 years.
Make an impact on the environment and society with 1 (fully paid) Impact Day.
Receive generous family leave, child support, mental health support, and sabbatical opportunities.
We enjoy gathering for meals, cultural initiatives, and events like local Summer Sessions and year-end celebrations. There's also healthy snacks, drinks, and a weekly catered lunch.
Empfohlene Jobs
SAP Programm- / Projektmanager (w/m/d) in Köln
In Voll- oder Teilzeit Empowering You – to break new ground! Als NTT DATA Business Solutions sind wir mehr als nur ein Unternehmen – wir sind ein Team von innovativen und leidenschaftli…
Tierarzt (m/w/d) für die Notdienstbesetzung und Station in 13059 Berlin
Wir, die Kleintierspezialisten Berlin-Brandenburg, haben uns auf die Fahne geschrieben, unsere Patienten wie unsere eigenen Haustiere zu behandeln: Mit Fürsorge und Behutsamkeit, mit der modernsten m…
System Engineer (m/w/d) - IT & Cloud-Services
Im Auftrag unseres Mandanten, einem dynamischen Managed-Service-Provider mit Fokus auf IT-Infrastrukturlösungen, suchen wir einen erfahrenen System Engineer Professional Service für den Standort Berl…
Fachkraft (m/w/d) für Medizinprodukteaufbereitung
Bist du auf der Suche nach einem abwechslungsreichen und spannenden Tätigkeitsfeld? Dann bist du bei uns genau richtig. Wir suchen dich im Rahmen der Arbeitnehmerüberlassung als Fachkräfte für Mediz…
(Senior) Kotlin / Java Software Engineer (Application Framework) (m/f/x)
Scalable Capital is a leading digital investment and banking platform with a full banking licence, empowering people across Europe to shape their own finances. Scalable Broker makes it easy and a…
Berater *in SAP S/4HANA Instandhaltung / Service (EAM)
Über uns WIR - das sind über 400 SAP Berater innen und Entwickler innen mit Fokus auf die aktuellen SAP-Lösungen und ergänzende Produkte mit branchenübergreifender Expertise über alle Prozesse hinweg:…
PhD Internship (f/m/d) - AI Research - Foundation Models on Structured Data
We help the world run better At SAP, we keep it simple: you bring your best to us, and we'll bring out the best in you. We're builders touching over 20 industries and 80% of global commerce, and w…
Senior Consultant/Manager Risikomanagement (m/w/d)
Der Arbeitgeber Unser Mandant ist eines der führenden Beratungsunternehmen für Compliance, Finanz- und Risikomanagement im Versicherungs- und Bankenumfeld und ist eine Tochter eines der weltweit g…
Projekt Manager:in Werbeagentur
Für eine sympathische mittelständische Werbeagentur in Berlin, mit etwa 15 Mitarbeitern, suchen wir kurzfristig Unterstützung im Projektmanagement. In familiärer Atmosphäre entstehen hier Kampagnen f…
Kierowca C+E Berlin plandeka 3/1 lub bez systemu
Kierowca C+E Berlin plandeka 3/1 lub bez systemu Werkservice zajmuje się rekrutacją, zatrudnianiem i delegowaniem ludzi do pracy zarówno na terenie Polski, jak i za granicę. Dla naszego niemieckieg…