Starts 6 Apr 2022 15:00
Ends 6 Apr 2022 16:00
Central European Time
Central Area, 2nd floor, ex SISSA building
via Beirut, 2
Real world datasets characterised by discrete features are ubiquitous: From categorical surveys to clinical questionnaires, from unweighted networks to genomic strands. Nonetheless, the development of methods to treat data with discrete features lags behind, particularly concerning geometric and manifold learning approaches. Due to the lack of such tools, the analysis of aforementioned dataset still relies on algorithms developed for continuous spaces, inevitably introducing approximations, error and biases. In this work, starting from the appropriate definition of volumes on lattices, we develop a very simple, yet effective, routine to estimate the intrinsic dimension of datasets naturally described by discrete metric spaces. Besides, our id estimator allows to explicitly select the scale at which the id is computed, an important property that is hardly provided even in estimators for continuous spaces. We assess the validity of the new estimator on artificial datasets against a state of the art continuous estimator and then apply it to a controlled-id spin system as well as to an ensemble of genomic sequences