My name is Cedric Kienzler and I’m a Senior Software Engineer specializing in resiliency engineering at scale

Know more

Skills

I AM REALLY GOOD AT

Reliability Engineering

Kubernetes

Incident Response

Working at scale

Recent Talks

The world of container orchestration is often daunting for newcomers! But in this talk we will embark on a journey through the evolution of infrastructure management, from bare metal and virtual machines to the modern world of containers and their orchestration.

In today’s rapidly evolving technology landscape, robust and scalable observability is crucial for maintaining reliable, high-performance systems. This talk delves into an advanced observability stack, known as the LGTM stack—Loki, Grafana, Tempo, and Mimir—that surpasses the limitations of traditional Prometheus-based solutions while maintaining compatibility with existing query languages and alert configurations. We will explore the core principles of OpenTelemetry, its seamless integration into the LGTM stack, and the significant benefits it brings to monitoring and tracing capabilities.

As the software development landscape continues to evolve, the roles of Site Reliability Engineering (SRE), DevOps, and Platform Engineering often leave people puzzled about their distinctions and interrelations. In this engaging 30-minute talk, we’ll clarify these concepts by delving into the world of SRE, examining its unique position at the intersection of DevOps and Platform Engineering.

Recent Posts

Occasionally I write some blog-posts

Cloud infrastructure is a necessity in our modern digital world. However, understanding and preparing for failures in cloud infrastructure is critical for reliability of our services. Failures can be viewed as learning opportunities and to improve our system design. It can inform proactive problem-solving, fostering effective incident response, and guiding future design challenges. Chaos Engineering plays a vital role in testing for resilience of our system.

Safe yourself and others a ton of time tediously setting up development environments and finally start using devcontainers for VScode. Here is what you need to know!

Bootstrapping a production ready Kubernetes cluster using Terraform and KubeOne on Hetzner Cloud running Canal CNI

Accomplish­ments

PREVIOUS ASSOCIATIONS THAT HELPED TO GATHER EXPERIENCE

DevOps-Lead Kubernetes for rc3.world

Offered support to the rc3.world backend team as part of the rC3 infrastructure team, overseeing and maintaining a Kubernetes Cluster as well as the a complex, multi-staged, Elixir deployment hosting the remote congress experience.

• Orchestrated a highly available Elixir Application using Kubernetes, Kustomize, Helm, ArgoCD, ingress-nginx, cert-manager, kube-prometheus-stack, and many more

• Enhanced the Dockerfile layout to use a multi-stage docker build proceess ensuring a small and optimized image ran in production

• Built and maintained a multi-stage GitLab CI Pipeline to compile, verify, and deploy documentation for the Elexir application that powered the world backend to the congress 2D world

• Coordination of deployments in close collaboration with the leadership team (PL) of the event ensuring minimal disruption during peak hours

• 24h on-call responsibility during the whole event to resolve incidents with the infrastructure and the deployments quickly

Head of Network - GPN

Oversaw and orchestrated the event organisation as the representative of the network team towards the event’s project management, including sponsor aquisation, budget management, and logistic for a highly reliable fault tolerant network for one of the largest hacker events in Germany.

• Lead a Team of 15 volunteers with various skills and levels to build and run the network infrastructure

• Planned and controlled the budget

• Acquired sponsors for network hardware and BGP interconnectivity

• Wrote a SDN Controller in python3 using netconf, SNMPv3, and jinja2 for a heterogenous infrastructure to distribute the network in 5 lecture halls, two big hackcenters and the “OpenCodes” Art exhibition

• Created extensive documentation during all stages of the project to ensure a reliable network

• Worked together with 3rd party service providers to provide fiber connectivity as well as multi-homed BGP connectivity to the venue

• Monitored the whole network infrastructure using Prometheus and Grafana

Experience

PREVIOUS ASSOCIATIONS THAT HELPED TO GATHER EXPERIENCE

 
 
 
 
 

Senior Software Engineer - Azure Resiliency Engineering

Microsoft

Feb 2022 – Present Dublin, Ireland (Remote from Hamburg, Germany)
 
 
 
 
 

Team Lead Kubernetes SRE

German Edge Cloud

Jul 2020 – Jan 2022 Eschborn, Germany (Remote from Hamburg, Germany)

Offer professional support to organisation by overseeing day-to-day activities, such as administering budgets allocated to managed Kubernetes Service, maintaining Kubernetes Infrastructure, reviewing code and design documents, and making optimal pricing structure with Head of Sales and accounting.

• Liaised and worked in close collaboration with C-level Management for designing effective strategies and delivering high quality products and services towards customers.

• Established a remote team of site reliability engineers and took ownership of managed Kubernetes platform.

• Developed positive working relationships between business units and improved workplace diversity by driving a company-wide initiative.

• Created new incident, change, and problem management processes with service management team.

• Ensured smooth functioning of operations, while serving as a Product and Platform Owner, representing company in Cloud Native Computing Foundation, and supporting multiple teams with an ultimate goal of becoming a CNCF certified “Kubernetes Certified Service Professional”.

 
 
 
 
 

Site Reliability Engineer II - ODSP LiveSite SRE

Microsoft

Jan 2019 – Mar 2020 Dublin, Ireland

Actively participated in various open-source projects, accomplished 99.99% service level agreements, and increased corporate security by working with onsite Red- and Blue team in Redmond. Planned and organised meetups for the Microsoft Ireland Open-Source Club.

• Re-designed the project management process through agile methodologies and drove a cultural change to implement modern SRE methods.

• Shifted the development environment towards state-of-the-art tooling, like Git, Azure DevOps, and Jupyter Notebooks.

• Improved the service reliability by coordinating with various teams across multiple business units.

• Directed the development and integration of multiple internal tools, resulting in enhancing the efficiency of on-call engineers and mitigating incidents and outages.

• On-boarded a new SRE team in China for executing “follow the sun” schedule by implementing proven SRE processes and delivering exceptional on-site training.

 
 
 
 
 

Software Engineer - Network Security

Sophos

Jan 2017 – Dec 2018 Karlsruhe, Germany

Oversaw the designing phase of a new authentication, authorization, and accounting service, while using a microservice-based approach to increase the scalability and maintenance of code.

• Played a vital role in the development of “Synchronised Application Classification Engine” product.

• Conducted unit-tests with high branch coverage, improved the integration test suite for speeding up the release cycles, replaced the IPsec library used in firewall appliances, and made major contributions in open-source community platforms.

• Led the design and development of a new single sign on service on the firewall for integrating user- based policies on chrome books without the need of captive portals.

 
 
 
 
 

Software Engineer - EDI Process Integration

MARKANT Handels- und Service GmbH

Aug 2015 – Dec 2016 Offenburg, Germany

Created new real time file conversions to convert complex message types for B2B communication. Provided phenomenal training to apprentices on software architecture, clean code, and common design patterns and encouraged staff to enhance workflow efficiency. • Migrated from CVS to GIT during project development, a major version upgrade for the IDE and developed continuous integration pipeline to modernise departments development infrastructure.

• Improved response times during incidents in a time-critical trading system by using real-time monitoring.

• Evaluated various unit testing frameworks and refactored legacy code to enable unit testing in the existing product code.

 
 
 
 
 

Junior Software Engineer - Apprenticeship

Streit Datentechnik

Aug 2012 – Jul 2015 Hasslach im Kinzigtal, Germany

Designed and implemented monitoring services and improved the reliability of company’s network and build infrastructure.

• Extended existing UI controls to match modern UI standards and implemented new UI controls to improve customer experience.

• Wrote a disassembler for reading dependencies from Windows-PE and C# executables.

• Gained invaluable knowledge regarding MS Visual C++, C# .NET, MS T-SQL, MFC, and Win32-API.

Projects

ALL THINGS ARE DIFFICULT BEFORE THEY ARE EASY

*

kkpctl

Use the Kubermatic Kubernetes Plattform at your favourite CLI

KubeOne

Contributing Bugfixes and Improvements back to the community

Kubermatic Kubernetes Platform

Contributing Bugfixes and Improvements back to the community

Open Infrastructure

Infrastructure should be accessible for everyone.

BIO Routing

bio routing is a project to create a versatile, fast and reliable routing daemon in Golang.

Chaos Communication Congress Orga

Coordination of logistic. Especially for beverages and other goods

GPN NOC

Overseeing and maintaining the Event Network

rC3 World Backend

Overseeing and maintaining a Kubernetes Cluster and the multi-staged deployment