Abstract

Contributed Talk - Splinter EScience

Tuesday, 16 September 2025, 14:55

Compute Cloud: Constructing a high available container orchestration infrastructure

E. Tom L. Strauß
Leibniz Institute for Astrophysics Potsdam (AIP)

In order to meet the ever-increasing demands of future astrophysics in terms of distributed computing capacities, the AIP is currently developing a highly available container orchestration infrastructure using Rancher and Kubernetes. A connection between Rancher and a separate GitLab instance allows automatic deployments of applications through CI/CD pipelines and the selection of the RKE2 Kubernetes distribution guarantees an enterprise ready and security focused solution. In addition, choosing this tech stack allows for a high degree of customization and certain freedoms in terms of licensing rights, as all components are open source. This facility is already in use and hosts different services like REANA, locally developed websites for research topics, a instance of CoCalc, databases and also a part of the AI stack at the AIP. The AI stack itself is based on open source software and open weights / source LLMs, which are used to integrate chat bots into websites and provide highly specialized agentic AI workflows build with FlowiseAI. Additionally selected assistants and LLMs are accessible via a self hosted Open WebUI chat interface. These assistants are able to generate whole REANA workflows or help users to interact with different systems using modern AI technologies like RAG and tool usage by LLMs. The connection between this two facilities makes it possible to increase automatization, help researchers in the future to efficiently use HPC resources and lowers the current barrier to interact with highly specialized systems. Furthermore the completely local hosted approach enables to comply with the GDPR, potentially collect data for analysis or AI improvements and allows to adjust the whole ecosystem to the users needs.