Abstract

Contributed Talk - Splinter EScience

Tuesday, 16 September 2025, 15:52

From data to scientific breakthroughs with tools powered by Representation Learning

Sebastian T. Gomez
Heidelberg Institute for Theoretical Studies (HITS)

Current applications of machine learning in astrophysics focus on teaching machines to perform domain-expert tasks accurately and efficiently across enormous datasets. While essential in the big data era, this approach is limited by our intuitions and expectations, and provides at most only answers to the ‘known unknowns’. We are developing new tools to enable scientific breakthroughs by discovering unbiased interpretable representations of complex data ranging from observational surveys to simulations. Our tools automatically learn low-dimensional representations of complex objects such as galaxies in multimodal data (e.g. images, spectra, datacubes, simulated point clouds, etc.), and provide explorative access to arbitrarily large datasets using a simple interactive graphical interface. I will demonstrate potential uses of our tools and discuss how the representations can be used downstream for simulation-based inference and model-driven experiment design. Our framework is designed to be interpretable, work seamlessly across datasets regardless of their origin, and provides a path towards discovering the ‘unknown unknowns’.