🧑💻 Zhipeng “Zippo” He @ School of Information Systems
Queensland University of Technology
IS Doctoral Consortium 2023
November 23, 2023
Supervisory Team
A/Prof. Chun Ouyang
Prof. Alistair Barros
A/Prof. Catarina Moreira (UTS)
\[ \DeclareMathOperator*{\argmin}{arg\,min} $\newcommand{\one}{\unicode{x1d7d9}}$ "\uD54F" \]
Adversarial Robustness
Adversarial examples are specialised inputs created with the purpose of confusing a neural network, resulting in the misclassification of a given input. These notorious inputs are indistinguishable to the human eye but cause the network to fail to identify the contents of the image. (Goodfellow, Shlens, and Szegedy 2015)
Image (unstructured):
Tabular (structured):
Different feature ranges and feature types
Missing or complex irregular spatial dependencies exist in correlation
Information loss may happen when pre-processing features with dependency
Changing a single feature can entirely flip a prediction on tabular data
Effectiveness
Imperceptibility
Transferability
Effectiveness
Imperceptibility
Transferability
Effectiveness
Imperceptibility
Transferability
Given a machine learning classifier \(f: \mathbb{X}\to \mathbb{Y}\) mapping data instance \(\boldsymbol{x} \in \mathbb{X}\) to label \(y \in \mathbb{Y}\), an adversarial example \(\boldsymbol{x}^{adv}\) generated by an attack algorithm is a perturbed input similar to \(\boldsymbol{x}\), where \(\boldsymbol{\delta}\) denotes input perturbation. \[ \boldsymbol{x}^{adv} = \boldsymbol{x} + \boldsymbol{\delta} \hspace{0.8em}\text{subject to } f(\boldsymbol{x}^{adv})\neq y \]
\[Bounded Attack: \max{\mathcal{L}(f(\boldsymbol{x}^{adv}),y)} \hspace{0.8em}\text{subject to } \Vert\boldsymbol{\delta}\Vert \leq \eta \]
\[ Unbounded Attack: \min{\Vert\boldsymbol{\delta}\Vert} \hspace{0.8em}\text{subject to } f(\boldsymbol{x}^{adv})\neq y \]
The success rate of an adversarial attack is the percentage of input samples that are successfully manipulated to cause misclassification by the model. \[ \text{Attack Success Rate} = \frac{1}{n}\sum_{i=1}^{n}\one( \boldsymbol{x}^{adv}\neq y) \]
A good adversarial example is expected to perturb fewer features that will result in changing the model’s prediction.
Here, I adapt \(\ell_0\) norm (Croce and Hein 2019) to tabular data as sparsity metric, which measures the number of changed features in an adversarial example \(\boldsymbol{x}^{adv}\) compared to the original input vector \(\boldsymbol{x}\).
\[ Spa(\boldsymbol{x}^{adv}, \boldsymbol{x})=\ell_0(\boldsymbol{x}^{adv}, \boldsymbol{x})=\sum_{i=1}^{n}\one( x^{adv}_i-x_i) \]
A good adversarial example is expected to introduce minimal perturbation, which can be obtained as the smallest distance to the original feature vector.
\[ \ell_p(\boldsymbol{x}^{adv},\boldsymbol{x})=\Vert\boldsymbol{x}^{adv}-\boldsymbol{x}\Vert_p =\begin{cases} \Bigl(\sum_{i=1}^n(x^{adv}_i-x_i)^p \Bigr)^{1/p}, & p\in\{1,2\}\\ \sup_{n}{\vert x^{adv}_n- x_n\vert}, & p \rightarrow \infty \end{cases} \]
Perturbed vectors should be as similarly as possible to the distribution of the original data input.
\[ \begin{gathered} \text{std}(\boldsymbol{x}^{adv})= \sqrt{ \frac{ \sum^m ( x_{i}- \bar{x}_{i})^2 }{ m } }\\ \text{sen}(\boldsymbol{x}^{adv})=\frac{\vert x^{adv}_{i} - x_{i} \vert}{\text{std}(\boldsymbol{x}^{adv})}, \end{gathered} \]
Introduce domain knowledge into the evaluation of imperceptibility.
Immutability
Feasibility
:::
Similar and reasonable accuracy among three predictive models suggests comparability of the impact of adversarial examples across models.
Datasets | LR | SVM | MLP |
---|---|---|---|
Adult | 0.8524 | 0.8532 | 0.8521 |
German | 0.8125 | 0.8125 | 0.7969 |
COMPAS | 0.7933 | 0.7976 | 0.8089 |
Diabetes | 0.7578 | 0.7578 | 0.7266 |
Breast Cancer | 0.9844 | 0.9844 | 0.9688 |
Phase 2.3
There is a trade-off between imperceptibility and effectiveness.
Optimisation-based attacks should be the preferred methods for tabular data.
Overall, C&W \(\ell_2\) attack obtains the best balance between imperceptibility and effectiveness.
C&W attack is designed to optimise a loss function that combines both the perturbation magnitude with distance metrics and the prediction confidence with objective function:
\[ \argmin_{\boldsymbol{x}^{adv}} \Vert\boldsymbol{x}-\boldsymbol{x}^{adv}\Vert_p + c\cdot z(\boldsymbol{x}^{adv}) \]
Adding sparsity as a term in the optimisation function is important for adversarial attacks on structured data.
Medium-Low
to High
;Race
are changed.Thank you!
QUT Information Systems Doctoral Consortium 2023 (Stream B): Zhipeng “Zippo” HE