FAQ – Espresso – Efficient Search over Personal Repositories

What is the ESPRESSO project about?

The ESPRESSO project is about decentralised search across personal online datastores (pods). We are an EPSRC-funded research project focused on developing algorithms, indexes, and meta-information data structures to enable large-scale data search across distributed pods while respecting individuals’ data sovereignty.

How does the ESPRESSO Project enable efficient large-scale search across distributed personal online datastores (pods)?

The ESPRESSO project aims to investigate, create, and assess effective algorithms, indexes, and meta-information data structures that can facilitate the searching of vast amounts of data across distributed pods. This will be done while taking into account the diverse access rights and data caching requirements that are set by each pod owner.

How is a centralized web different from a decentralized web?

The main difference between a centralized and decentralized web is the degree of centralization or distribution of data storage, access control, and network architecture under the control of different individuals. A centralized web relies on a few entities having control over resources, while a decentralized web relies on a large number of entities with control.

What are the main objectives of the ESPRESSO project?

Design of decentralised algorithms supporting large-scale keyword-based search and distributed querying over datasets and content stored across pods maintained on distributed servers.

Design of ACL-aware meta-information data structures and indexes to enable the discovery of both structured and unstructured data stored in distributed pods under owner-specified access and caching rights.

Performance evaluation of our algorithms, data structures, and indexing techniques in generic and health-related application scenarios.

Active engagement with the research community and industry stakeholders.

Which Scenarios does the ESPRESSO Project fit?

Health-care and Well-being data sharing scenarios.

What else? (will come with other scenarios).

How the ESPRESSO fits in the state-of-the-art of decentralized Web?

When conducting a data search over pods, it is necessary to go beyond the current state-of-the-art techniques in distributed keyword-based search and distributed querying. This is because the access rights to pod data can differ for different search parties, and caching restrictions can affect the propagation of search results across the network.

Current methods of distributed database querying assume that the querying party has access to query endpoints, indexes, and caching options. However, in decentralized environments, this is not the case. Decentralized SPARQL querying requires creating and maintaining endpoint metadata for each search party, access control for resource selection, and caching control for SPARQL link-following, which can lead to significant increases in storage, network, and computation overheads.

What are the research areas the ESPRESSO project covers/touches?

(Distributes/Federated) Databases

Information Retrieval (IR)

Distributed Systems

Artificial intelligence AI and Federated Learning

Who are the parties collaborating on the ESPRESSO Project?

The project is collaborating with the NExT++ centre in Singapore and with the SOLID and HAT project teams.

Birkbeck, University of London.

DataSwift

What is the SOLID framework?

SOLID is an emerging framework for developing decentralised applications based on personal online datastores.

What is a pod?

Pod is a personal online datastore where individuals can store their data and have control over which applications can access their data and for what purposes.

How can I get involved in the project?

The project will actively engage with the academic community and industry stakeholders through dedicated events.

Please contact the project team for more information on how to get involved.

Contact(s): (link to the form coming later)

How will the project’s findings inform current research and innovation?

The findings from the ESPRESSO project will provide valuable insights for ongoing studies in various fields, including distributed systems, databases, the digital economy, and cybersecurity.

The outcomes will also be beneficial for developing new decentralized applications and for tackling policy challenges associated with data privacy and sovereignty.