Having examined attacks that manipulate model inputs (evasion) or the training process (poisoning), we now turn to methods that extract information from or about a trained model. This chapter addresses attacks targeting the confidentiality of the model itself and the data used to train it, often requiring only standard query access.

You will study several inference techniques:

Membership Inference: Determining whether a specific data sample $x$ was included in the training dataset $D_{train}$ .
Attribute Inference: Inferring sensitive features or attributes of training data records based on model outputs or behavior.
Model Inversion: Reconstructing average or representative examples of the data used for training specific classes.
Model Stealing (Functionality Extraction): Creating a surrogate model that mimics the behavior of a target black-box model by observing its input-output pairs $(x, f(x))$ .

These attacks directly relate to data privacy. Understanding them is necessary for evaluating the potential information leakage from deployed models. We will also consider how these attacks connect to formal privacy concepts like Differential Privacy. By the end of this chapter, you will grasp the principles behind these inference methods and their security implications.

Chapter 4: Model Inference and Privacy Attacks

Sections