Infrastructure and Design Solutions of the ELKH Data Repository Platform

Publication year
Péter Kacsuk, Ádám Pintér, Szabolcs Tenczer, Ákos Hajnal

​Access to the data sets that underpin scientific results, making them publicly available and searchable for further research can not only promote the development of a given field of science but also open the door to interdisciplinary and international research, which is nowadays increasingly becoming a basic principle, an expectation, and sometimes a requirement. The publication and storage of data related to scientific publications is often done in an ad-hoc manner but this has a number of drawbacks. Ensuring long-term accessibility is not easy and the searchability and availability of data can be difficult due to heterogeneity and fragmentation.
The ELKH Data Repository Platform (ELKH ARP) project aims to create a central Hungarian research data repository, which will allow for the FAIR storage and management of research data, the long-term storage of research datasets associated with publications and their sharing (open or closed) with the scientific community, between disciplines or even internationally, thus ensuring the possibility of future integration into an international research infrastructure. The digital infrastructure required for the construction of the data repository is being developed by the two research centres of the Eötvös Loránd Research Network (ELKH): the Institute for Computer Science and Automation (SZTAKI) and the Wigner Research Centre for Physics (Wigner FK), with the support of the ELKH Secretariat, within the framework of the ELKH Data Repository Platform project.
Such a system is expected to provide high availability, high data security, high bandwidth data connectivity, long-term storage, the ability to accommodate huge data volumes, and last but not least, the ability to efficiently search across data.
In this paper, we review the design aspects and implementation steps of the hardware and software infrastructure of the data repository storage and its main aspects, covering technical details, major design decisions and the solutions chosen. We will present the role and benefits of connecting to the ELKH Cloud system, how data security can be guaranteed by using different levels of redundancy and data replication, and how primary data collection processes at research institutes can be made even more convenient and efficient, adapting to possible unique, local work processes.