## Física estadística de selección y validación de modelos para datos complejos

**Dates:** from June 1, 2020 to May 31, 2023

**Funder**: MINECO (Spain)

**Project id**: PID2019-106811GB-C31

**Total Funding**: 102,000€

**Visit the project web page**

At least since the scientific revolution, interpretable mathematical models have been instrumental for advancing our understanding of the world. The “big data” era held the promise of facilitating the discovery of similarly interpretable mathematical models of natural and socio-economic systems that were previously not amenable to quantitative analysis. Yet, so far we have not seen such an explosion of new interpretable mathematical models. This is in part because machine learning models are de facto taking their place. However, because most machine learning algorithms are not interpretable, an uncontrolled use of such approaches can have unwanted consequences when model outcomes are directly linked to decisions.

Statistical physics approaches precisely rely on using interpretable micro-scale models to understand macro-scale behavior and as such they are uniquely positioned to lay the foundations of alternative algorithms for interpretable model selection and validation that will learn from data but that will significantly differ from the machine learning we know today.

A particular setting in which the need of better interpretable models is critical is that of socio-economic systems, and especially cities, where understanding the micro-motives of human behavior is necessary to explain the macro-behavior of those systems, and to inform policy-making decisions. Unfortunately, despite the fact the statistical physics contributions to modeling urban phenomena, most of the used tools do not go beyond the “bottom-up” theoretical metaphor. However, because of the expected growth of cities at a global scale in the next decade and the fact that more urban data is available, there is a pressing need to be able to obtain interpretable models for urban social contexts which are informed by data and that can be validated within an urban setting.

StatPhys4Cities will take on these challenges in a coordinated effort between three institutions URV, UB and UC3M that will contribute and advance the research of urban-related problems from a statistical physics approach that combines models and methods from network theory, stochastic processes, and critical phenomena, among others with a data-driven approach. Specifically, StatPhys4Cities has two overarching goals:

- To develop interpretable model selection and validation tools using statistical physics principles. The tools should also inform the process of obtaining further data to answer a specific research questions.
- To gain a better understanding about mobility, welfare and inequalities within cities through the analysis/modeling/interpretation of existing data and the acquisition of new data specific to these problems.

### Publications

- Differences in collaboration structures and impact among prominent researchers in Europe and North America - EPJ Data Sci. 12 , art. no. 12 (2023).
- Fundamental limits to learning closed-form mathematical models from data - Nat. Comm. 14 , 1043 (2023).
- Socially disruptive periods and topics from information-theoretical analysis of judicial decisions - EPJ Data Sci. 12 , art. no. 2 (2023).
- Complex systems in the spotlight: next steps after the 2021 Nobel Prize in Physics - J. Phys. Complex. 4 , 010201 (2023).
- Bayesian symbolic learning to build analytical correlations from rigorous process simulations: Application to CO2 capture technologies - ACS Omega 7 (45) , 41147 -41164 (2022).
- Stochastic block models reveal a robust nested pattern in healthy human gut microbiomes - PNAS Nexus 1 (3) , pgac055 (2022).
- Gene regulatory network inference in long-lived C. elegans reveals modular properties that are predictive of novel ageing genes - iScience 25 (1) , 103663 (2022).
- Node Metadata Can Produce Predictability Crossovers in Network Inference Problems - Phys. Rev. X 12 , 011010 (2022).
- Automatic modeling of socio-economic drivers of energy consumption and pollution using Bayesian symbolic regression - Sustain. Prod. Consum. 30 , 596 -607 (2022).
- Complex decision-making strategies in a stock market experiment explained as the combination of few simple strategies - EPJ Data Sci. 10 , 26 (2021).