Platform Engineering at Scale: Building Self-Service Dev Environments with Observability
Keywords:
Internal Developer Platform, Kubernetes, Observability, Telemetry, Service Catalog, Self-ServiceAbstract
Modern software companies encounter significant problems with boosting developer productivity and handling their development environments. IDPs based on Kubernetes combine user convenience and ease of use with reliability and security for developers. Prometheus and OpenTelemetry help make it easier to see how well the platform and applications are functioning. It examines the methods used to build scalable IDPs relying on Kubernetes, telemetry and service catalogs for improved self-service. Multi-tenant setup, how telemetry information is collected and its visualization and how services are found with catalogs form the main topics looked at. It then talks about best guidelines for scaling observability data, difficulties in these practices and solutions for striking a balance between letting developers decide and maintaining control. Diagrams clearly show the main design patterns and the processes involved in adding monitoring to the system. Such findings are used to assist platform engineering teams in developing solid, trackable and scaled environments for developers, helping them release software quicker.
References
Baier, J. (2017). Getting started with kubernetes. Packt Publishing Ltd. https://books.google.com/books?hl=en&lr=&id=fnc5DwAAQBAJ&oi=fnd&pg=PP1&dq=Kubernetes:+Up+and+Running&ots=ZvK3wWRxzO&sig=ZRCUnQgSBTjXbN27Vc56MSZlYZU#v=onepage&q=Kubernetes%3A%20Up%20and%20Running&f=false
Brazil, B. (2018). Prometheus: Up & Running: Infrastructure and Application Performance Monitoring. " O'Reilly Media, Inc.". https://books.google.com/books?hl=en&lr=&id=QW1jDwAAQBAJ&oi=fnd&pg=PT15&dq=Prometheus:+Monitoring+at+SoundCloud&ots=5vghsHPbF5&sig=G43-9BnfcMqeZEhXUsNEBFl75Ls#v=onepage&q=Prometheus%3A%20Monitoring%20at%20SoundCloud&f=false
Kuptsov, V. Y. A. C. H. E. S. L. A. V., & Golubeva, O. (2018). Principles of Developing Web Applications Using the “Twelve-Factor App” Methodology. https://elib.psu.by/bitstream/123456789/31647/1/374-376.pdf
Parker, A., Spoonhower, D., Mace, J., Sigelman, B., & Isaacs, R. (2020). Distributed tracing in practice: Instrumenting, analyzing, and debugging microservices. O'Reilly Media. https://books.google.com/books?hl=en&lr=&id=g5bcDwAAQBAJ&oi=fnd&pg=PR2&dq=OpenTracing:+Distributed+Context+Propagation&ots=mFzgw9TwVW&sig=XaSV_elP4YSrW2_HrRk38mtHXn0#v=onepage&q=OpenTracing%3A%20Distributed%20Context%20Propagation&f=false
Jansson, G., Johnsson, H., & Engström, D. (2014). Platform use in systems building. Construction management and economics, 32(1-2), 70-82. https://www.tandfonline.com/doi/abs/10.1080/01446193.2013.793376?casa_token=d-14ydd-EkoAAAAA:XcbjoI0r_FthGIkTh6GWsmVaDjXi9QuQyGJIImiFcc5JdJ_-dHOFW0p7jucgOF5TqLdl66MR2JCz3qnb-Q
Sharma, R., & Singh, A. (2019). Getting Started with Istio Service Mesh: Manage Microservices in Kubernetes. Apress. https://books.google.com/books?hl=en&lr=&id=qBvCDwAAQBAJ&oi=fnd&pg=PR5&dq=Istio:+An+Open+Platform+to+Connect,+Manage,+and+Secure+Microservices%22&ots=wMBeqz-19A&sig=9QnoBvCaye_am1dQ6BMtn_7loSs#v=onepage&q&f=false
Sahin, M. (2019). GitOps basiertes Continuous Delivery für Serverless Anwendungen (Master's thesis). https://d-nb.info/118231581X/34
Ferreira, A. P., & Sinnott, R. (2019, December). A performance evaluation of containers running on managed kubernetes services. In 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) (pp. 199-208). IEEE. https://ieeexplore.ieee.org/abstract/document/8968907
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Chiranjeevulu Reddy Kasaram (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


