Courses

The Courses

The program includes seminars spanning critical and emerging topics
on the problems, the regulations, and the challenges related to the ethical and legal aspects of data technologies

speaker
Prodromos Tsiavos

Onassis Group

AI and the EU acquis: Toward a paradigm shift in regulating technologies in the EU

The EU AI Act introduces a novel approach as to how we may address techno-regulatory issues, particularly in relation to the explicability, explainability and predictability of AI systems. AI Act seems to break from the technological neutrality paradigm and to seek to intervene at the tech design level. However, a closer inspection reveals its two facet dimension: First, this is an incremental rather than radical innovation that builds on previous techno-regulatory initiatives, particularly the Open Data Directive, Digital Market and Digital Services Act, as well as the General Data Protection Regulation. Second, the concept of technological neutrality, in a counter-intuitive fashion, includes techno-regulatory approaches and invites technological specificity rather than detachment from the technological realities of a regulated context. The lecture focuses on the double regulatory pivot: first, the "by-design" approach that makes the boundaries between tech designers and regulators blurring; and, second, the interplay between technological autonomy/ sovereignty vis-à-vis increasing cyberpaternalism appearing in the most recent tech regulation instruments worldwide. We present and discuss the historical origins and evolution of this regulatory approach, the main trends in the area and the ethical and institutional consequences of this pivot.

speaker
Toon Calders

University of Antwerp

Introduction to Fairness in Machine Learning

Decisions made through predictive algorithms sometimes reproduce inequalities that are already present in society. Is it possible to create a data mining process that is aware of fairness? Are algorithms biased because humans are? Or is this the way machine learning works at its most fundamental level?

In this lecture, I will give an overview of some of the main results in fairness-aware machine learning, the research field that tries to answer these questions. We will review several measures for bias and discrimination in data and models, such as demographic parity, equality of opportunity, calibration, individual fairness, direct and indirect discrimination. Even though for each of these measures strong arguments in favor can be found, we will show that they cannot be combined in a meaningful way. Next to these methods to quantify discrimination we also cover several “fairness interventions” aimed at making algorithms fair that were proposed in the last decade. These techniques include pre-processing techniques such as biased sampling, in-processing techniques that deeply embed fairness constraints in learning algorithms, and post-processing techniques to make trained models fair.

speaker
Daniele Dell'Aglio

Aalborg University

Share your data and hide your secrets! A brief introduction to data privacy

Sensors, social networks and smartphone applications are producing an unprecedented amount of personal data, with an intrinsic risk of exposing sensible and private information, such as health condition and location. Such risks were recently recognised by our society, which started investigating privacy and data protection from several perspectives, such as law, education and technology, and led to initiatives such as the GDPR by the European Union. While it is key to raise citizens' awareness for protecting themselves and their data, it is essential not to stop the exchange of data, which may negatively impact the economy, innovation, and research.

This talk focuses on data privacy, which studies how to publish and analyse data while preserving the sensitive information they contain. The first part of the talk will introduce data anonymisation techniques, which hide individual identities and sensitive information. We will learn how such techniques work, as well as their limitations and vulnerability to several privacy attacks. In the second part of the talk, we will introduce differential privacy, which emerged to overcome the limitations of anonymisation techniques and to offer solid statistical guarantees on the possibility of privacy leaks.

speaker
Minos Garofalakis

Technical University of Crete &
Athena Research Center

Small Synopses for Big Streaming Data: Fast, Accurate, Private(?)

Effective Big Data analytics need to rely on algorithms for querying and analyzing massive, continuous data streams (that is, data that is seen only once and in a fixed order) with limited memory and CPU-time resources. Such streams arise naturally in emerging large-scale event monitoring applications; for instance, network-operations monitoring in large ISPs, where usage information from numerous network devices needs to be continuously collected and analyzed for interesting trends and real-time reaction to different scenarios (e.g., hotspots or DDoS attacks). This talk will discuss basic algorithmic tools for real-time analytics over streaming data, focusing on small-space sketch synopses for approximating continuous data streams. Issues with guaranteeing privacy for sketch-based computations will also be discussed.

speaker
Georgia Koutrika

Athena RC

Fairness in Algorithmic Systems: a Reality or a Fantasy?

Algorithmic systems, driven by large amounts of data, are increasingly being used in all aspects of society to assist people in forming opinions and taking decisions. For instance, search engines and recommender systems amongst others are used to help us in making all sort of decisions from selecting restaurants and books, to choosing friends and careers. Other systems are used in school admissions, housing, pricing of goods and services, job applicant selection, and so forth. Such algorithmic systems offer enormous opportunities, but they also raise concerns regarding how fair they are. How much trust can we put in these systems?

We will analyze fairness risks through well-known use cases. Then, we will present models and methods for fairness in search engines and recommender systems. We will conclude our journey to algorithmic fairness by discussing challenges and critical research paths for future work, and an open question: Will the algorithmic systems of the future be fair?

speaker
Natasa Milic-Frayling

Intact Digital &
University of Nottingham

Responsible Innovation and Quality Assurance in Data Science & AI

Governments, industry organizations and individuals are increasingly adopting computerized systems and services that are becoming an integral part of local economies and are shaping the social fabric. Within regulated industry sector, computing applications are subject to safety regulations and strict quality control. In other sectors, the quality is controlled by public policies and market competition. With rapid changes and a lag between practices, policies and regulations, it is of utmost importance for computing professionals to be aware of the impactful and long-lasting implications of their work and to adhere to the principles of professional ethics.

The aim of this Seminar is to enable computing professionals to apply critical thinking and logical reasoning to specific ethical issues that they may encounter in their work and to develop practices that adhere to ethical frameworks, professional guidelines, and computing design principles to delivery quality systems. We will use ACM Code of Ethics to discuss accountability and responsibility in computing profession and review the emerging regulations aimed at quality assurance in data science and AI.

course-material

speaker
Jorge-Arnulfo Quiané-Ruiz

TU Berlin

On Democratizing Data Access and Processing: Current Efforts and Open Problems

The world is fast moving towards a data-driven society where data is the most valuable asset. Organizations need to use dispersed and heterogeneous datasets and perform very diverse analytic tasks that go beyond the limits of a single data storage and processing platform. As a result, they typically perform tedious and costly tasks to juggle their code and data across different platforms. Coping with this problem requires improving the interoperability among data storage and processing platforms (i.e., democratizing data storage and processing). For instance, democratizing data processing requires, among others, separating applications from data processing platforms (i.e., achieving cross-platform data processing). Yet, achieving cross-platform data processing is quite challenging: finding the most efficient platform for a given task requires good expertise for all the available platforms. In this seminar, we will discuss two projects we are leading to democratize data access and processing: Agora (https://www.agora-ecosystem.com/) and Apache Wayang (https://wayang.apache.org/). While Agora aims at democratizing the access to data-related assets (e.g., datasets, algorithms, ML models, and human expertise) as well as the processing of such assets, Apache Wayang aims at providing the first system that achieves cross-platform data processing.

In this seminar, we will dive into both projects and discuss the core technologies we are developing to achieve the aforementioned goals. In particular, we will discuss the current cost-based cross-platform query optimizer behind Apache Wayang as well as the current efforts to replace this cost-based optimizer with an ML-based optimizer. We will also discuss the mediator-less execution architecture behind Agora as well as how Agora achieves secure and compliant data processing when running under data privacy constraints. We will conclude this seminar with a discussion on the open problems for providing truly data access and processing democratization to foster AI innovation.

speaker
speaker
speaker
Natalia Manola,
Elli Papadopoulou, Manolis Terrovitis

OpenAIRE, Athena RC

FAIR and Open Science practices: creating advantages in the European Research Area

Research & Innovation ecosystems are globally evolving according to Open and FAIR (Findable, Accessible, Interoperable, Reusable) principles urging for ethical code of conducts and reproducible scientific results. In an era where data deluge drives economic growth and Machine Learning systems are being rapidly established in private and public sectors, there comes the need for Responsible Research and Innovation (RRI).

This lecture will communicate the European policy framework, and federated approaches to Research Data Management (RDM) promoted through the European Open Science Cloud (EOSC) for a ‘Web of FAIR data’ to be realized. The opportunities posed to European researchers and best practices for adopting and embedding Open and FAIR principles in their everyday work, data and software workflows, will be explained. The focus is for ECRs to learn the stages of RDM lifecycles and useful tools as well as to understand the different levels of Openness and FAIRness pertaining both to datasets that they collect or generate in the context of a research project, and to services that they develop using those data as input or output of operations.

The lecture will also discuss data privacy provisions according to GDPR and will discuss how data anonymization techniques can be used to address them. We will present the data anonymization tool Amnesia (https:/amnesia.openaire.eu), demonstrate its usage and explain how it can be used to support Openness and FAIRness.

The Invited Talks

The program includes invited talks on emerging topics and applications on challenging aspects of data technologies

speaker
Aggelos Kiayias

University of Edinburgh & IOHK

Decentralizing Information Technology: The Advent of Resource Based Systems

With the introduction of blockchain technology a little more than a decade ago, we witnessed a first instance of an information technology service deployed via open and incentive driven collaboration that is natively supported by the system itself. Viewed in this light, an IT service can emerge out of the self-interest of computer node operators who enrol themselves to support the system’s operation in exchange of rewards that are provided in the system’s digital currency. In this talk we cast this as a general paradigm for deploying information technology services and we describe the incentive mechanism used in the Cardano blockchain, an open smart contract platform built following a first principles design approach. We discuss design challenges, solutions and open questions as well as we look at what lies ahead for decentralized information technology services.

speaker
Lora Frayling

Health Data Insight (HDI)

The Simulacrum: Sharing privacy-preserving synthetic data to support healthcare research

Clinical patient data is vital for research and improvements in healthcare, however, not widely available for use due to concerns about patient privacy. At Health Data Insight (HDI), we’ve generated the Simulacrum, a synthetic version of real patient data that contains valuable properties of the real data but no patient-identifiable information. The method for generation involves sampling from a probabilistic Bayesian Network and handling conditional probability distributions such that underlying patient groups are sufficiently large (ensuring k-anonymity). This synthetic data can then be released and used by researchers, enabling them to gain real-world insight without ever seeing patient-identifiable data and thus posing minimal risk to patient privacy.

This talk will discuss the background of the underlying patient data and barriers to access, the method used for synthetic data generation and evaluation and the way in which the Simulacrum can be used to enable research. It will also highlight the benefits, limitations and future directions of the Simulacrum and alternative methods for synthetic data generation.