Spaces:

retkowski
/

ytseg_demo

Running

App Files Files Community

ytseg_demo / demo_data /nips-2021 /25959 /transcript_whisper_large-v2.vtt

retkowski

Add demo

cb71ef5 over 1 year ago

raw

history blame contribute delete

14 kB

	WEBVTT

	00:00.000 --> 00:13.120
	Hello, my name is Pouya Bahshiban and I'm going to tell you about our paper titled

	00:13.120 --> 00:18.720
	Adversarial Feature Desensitization. This is joint work with a number of wonderful collaborators

	00:18.720 --> 00:24.400
	at MIWA, University of Montreal and McGill University, including Reza Bayat, Adam Ibrahim,

	00:24.400 --> 00:32.160
	Kartika Hoja, Mojtaba Farmazi, Tourez Dale, Lake Richards and Erin Oji. A common assumption in

	00:32.160 --> 00:36.560
	machine learning is that the train and test samples come from the same distribution.

	00:37.200 --> 00:42.960
	While this is a reasonable assumption under most circumstances, it is intentionally violated in the

	00:42.960 --> 00:49.600
	regime of adversarial attacks. Adversarial attacks are algorithms that search for slight input

	00:49.600 --> 00:55.600
	perturbations that cause the input to be misclassified. In the case of white box attacks,

	00:55.600 --> 01:01.600
	the model itself is transparent to the attacker and the attacker uses it to identify the possible

	01:01.600 --> 01:07.760
	inputs that would lead to misclassifications. A famous example of this is the image of a panda

	01:07.760 --> 01:13.360
	that when perturbed with imperceptible noise, alters the model's prediction from a panda to a

	01:13.360 --> 01:19.840
	gibbon. As prior literature has shown, this is a common issue in almost all machine learning methods

	01:19.840 --> 01:25.280
	and unless the classifier is specifically trained to be robust against these attacks,

	01:25.280 --> 01:28.720
	the attacks could completely break down the classifier's performance.

	01:30.240 --> 01:35.600
	This issue becomes even more critical when we consider the vast usage of these machine learning

	01:35.600 --> 01:41.040
	systems in our societies. For example, the possible security concerns that rise in face

	01:41.040 --> 01:46.720
	recognition systems prone to adversarial attacks or the safety in autonomous driving systems.

	01:48.080 --> 01:54.000
	So what is an adversarial attack? To formally define the adversarial attacks, let's assume a

	01:54.000 --> 02:00.080
	feature learning function f that projects inputs x to latent space with feature space z

	02:01.600 --> 02:08.720
	and a classifier that uses the latent code z to predict the correct class label y hat.

	02:08.720 --> 02:14.480
	The perturbation function or the attack generates a perturbed sample x prime

	02:14.480 --> 02:21.520
	within the epsilon neighborhood of the input x, which we're showing here as b of x and epsilon.

	02:22.160 --> 02:28.880
	By maximizing the classification objective, the opposite of how we normally optimize the classifier's

	02:28.880 --> 02:36.720
	parameter. Many methods have been proposed to defend the models against adversarial attacks.

	02:36.720 --> 02:42.640
	Two of these methods that have withstood the test of time so far are the adversarial training

	02:43.200 --> 02:50.160
	by Alexander Modrianov, which proposes a defense method by solving a minimax optimization problem

	02:50.160 --> 02:56.000
	that involves finding an adversarial input by maximizing the classification loss in the inner

	02:56.000 --> 03:03.840
	loop followed by a classifier training to minimizing the classifier loss on these adversarial inputs.

	03:03.840 --> 03:09.920
	This procedure is graphically shown for two hypothetical classes in the diagram on this slide.

	03:10.560 --> 03:15.440
	The adversarial training method essentially learns to separate the distributions of adversarial

	03:15.440 --> 03:22.400
	examples belonging to different classes. The second method is the trades method by Zhang et al,

	03:22.400 --> 03:27.440
	which proposes to push the decision boundary of the classifier away from the data.

	03:27.440 --> 03:32.480
	Trades achieves this by introducing a regularization term to the original learning

	03:32.480 --> 03:38.320
	objective for classification that penalizes the mismatch between the predicted label

	03:38.320 --> 03:44.400
	for the clean and perturbed inputs. The diagram on the right side again graphically illustrates

	03:44.400 --> 03:50.000
	this procedure, where now the defense method learns to separate the distributions of clean examples

	03:50.000 --> 03:54.400
	belonging to different classes while minimizing the loss of the classifier.

	03:54.400 --> 03:59.920
	The third method is the trade method by Wang et al, which proposes to push the decision boundary

	03:59.920 --> 04:06.880
	of the classifier to the inner loop followed by a classifier training to minimizing the

	04:06.880 --> 04:13.120
	classification loss on these adversarial inputs. The third method is the trade method by Zhang et al,

	04:13.120 --> 04:18.720
	which proposes to push the decision boundary of the classifier to the inner loop followed by a

	04:18.720 --> 04:27.840
	classifier training to minimizing the classification loss on these adversarial inputs to the inner

	04:27.840 --> 04:34.640
	loop. The third method is the trade method by Wang et al, which proposes to push the decision

	04:34.640 --> 04:39.920
	boundary of the classifier to minimizing the classification loss. The fourth method is the

	04:39.920 --> 04:45.600
	trade method by Wang et al, which proposes to push the decision boundary of the classifier

	04:45.600 --> 04:52.160
	for a source domain, but we want the classifier to also perform the same task on a related target

	04:52.160 --> 05:00.960
	domain that we might not have enough data for or that the generating procedure for sampling

	05:00.960 --> 05:09.440
	domain might be expensive. The domain adaptation theory proposed by Ben David et al answers the

	05:09.440 --> 05:15.840
	question of under what conditions can we adapt a classifier trained on the source domain for use

	05:15.840 --> 05:23.920
	in the target domain. Here we consider the original clean distributions as the source domain and the

	05:23.920 --> 05:31.280
	distribution of adversarial images generated from those images as the target domain. Although here

	05:31.280 --> 05:38.240
	the target domain continuously evolves because the adversarial examples are based on the current

	05:38.240 --> 05:46.000
	state of the model at each time step. And similar to the domain adaptation theory, our goal here

	05:46.000 --> 05:52.960
	is to learn how to perform well on both source and target domains, meaning the natural and

	05:52.960 --> 06:02.240
	adversarial domains. Now before I tell you about our proposed method, let's dive a bit deeper into

	06:02.240 --> 06:08.960
	what the domain adaptation theory from Ben David et al states. Similar to before, let's assume a

	06:08.960 --> 06:14.880
	feature learning function f that projects inputs x to latent space or feature space z and the

	06:14.880 --> 06:23.040
	classifier that predicts the correct label y, y hat, from those latent codes. Now consider natural

	06:23.040 --> 06:31.440
	and adversarial examples as input domains dx and d' x and their induced feature distributions

	06:31.440 --> 06:42.560
	which go through the f function as dz and d' z. Also consider epsilon z and epsilon' z

	06:42.560 --> 06:50.320
	as the classification error over the domains dz and d' z, what we are going to refer to as the

	06:50.320 --> 06:58.880
	clean accuracy and the adversarial accuracy. The domain adaptation theory now gives a bond

	06:58.880 --> 07:04.320
	on the adversarial error in terms of the natural error and the distance between the two domains.

	07:05.120 --> 07:11.680
	Fortunately, from the prior work, we know that h delta h distance, which measures the distance

	07:11.680 --> 07:17.440
	between two domains, can be estimated using the classifier trained to discriminate between the

	07:17.440 --> 07:26.080
	two domains. Now our defense method called adversarial feature desensitization essentially

	07:26.080 --> 07:34.720
	minimizes the bound on the adversarial error epsilon' z using a three-step procedure which

	07:34.720 --> 07:40.560
	has some conceptual similarities with prior work on adversarial domain adaptation from Ganin et al.

	07:42.240 --> 07:49.280
	For this, we first update the parameters theta and phi in the feature learning function f and

	07:49.280 --> 07:56.320
	task classifier c to minimize the classification loss on the natural domain. This is shown with

	07:56.320 --> 08:01.920
	green arrows and green boxes marked 1 on both the equation and on the diagram.

	08:04.000 --> 08:10.400
	Secondly, we estimate the h delta h distance using an additional domain discriminator

	08:10.960 --> 08:17.600
	network that predicts the domain identity from the latent code z. We update the domain

	08:17.600 --> 08:24.720
	discriminator parameters psi to minimize the domain classification loss. And finally,

	08:24.720 --> 08:31.680
	in the third step, we update the feature learning network parameters theta to maximize the domain

	08:31.680 --> 08:39.600
	classification loss in an adversarial way. These two steps are marked with red arrows in the figure

	08:39.600 --> 08:48.960
	and red boxes on the equation. Similar to previous two methods, adversarial training and trades that

	08:48.960 --> 08:55.760
	I showed you, we here we can also graphically demonstrate this procedure. In our method AFD,

	08:55.760 --> 09:01.040
	we learn to separate the classes from the distributions of clean examples while at the

	09:01.040 --> 09:07.840
	same time we optimize a domain classifier that learns the boundary between the clean and adversarial

	09:07.840 --> 09:14.560
	examples for each class. And finally, we push the adversarial examples to the opposite side of that

	09:14.560 --> 09:22.400
	boundary. This procedure implicitly desensitizes the learned features to adversarial perturbations

	09:22.400 --> 09:30.480
	and hence the name adversarial feature desensitization. We tested our method on four

	09:30.480 --> 09:35.840
	data sets and compared them with a number of other baselines including with adversarial training and

	09:35.840 --> 09:43.760
	trades. We made two versions of our method called AFDTCGAN that uses the adversarial losses from

	09:43.760 --> 09:50.880
	Goodfellow et al and AFDWGAN that uses the Wasserstein losses from Arjovski and Goodtuner.

	09:52.000 --> 09:57.840
	In the table, we evaluated all methods on several white box and black box attacks with

	09:57.840 --> 10:07.360
	nominal strengths into each data set. Overall, our method AFD and especially AFDWGAN showed superior

	10:07.360 --> 10:15.200
	performance against most attacks in most data sets. However, AFD was behind trades on several attacks

	10:15.200 --> 10:20.720
	especially on CIFAR-100 and TinyImageNet data set that had more classes in it.

	10:20.720 --> 10:26.080
	We also looked in trust attack methods and attack strengths which we controlled with the parameter

	10:26.080 --> 10:32.800
	epsilon. The diagrams on the right show the robust accuracy for each defense method across

	10:32.800 --> 10:41.200
	eight attack methods and various epsilon values for each of them. Overall, our results in these

	10:41.200 --> 10:48.240
	diagrams showed that AFD's robustness generalizes better than the baselines across attacks and

	10:48.240 --> 10:55.200
	across attack strengths. To quantify these differences, we also computed the area under

	10:55.200 --> 11:00.000
	the curve for each method for each attack and summarized them in a table on the left.

	11:00.880 --> 11:06.800
	As you can see, AFD's robust performance generalizes better to unseen and stronger attacks

	11:06.800 --> 11:15.680
	compared to other baselines. If you remember from previous slides, the domain adaptation theory

	11:15.680 --> 11:22.400
	predicted a bound on the adversarial error which can also be turned into a bound on the generalization

	11:22.400 --> 11:30.320
	gap between natural and adversarial attacks. We empirically tested this prediction in our experiments

	11:30.320 --> 11:37.600
	under two settings. Under the first setting, we varied the epsilon value for the PGDL-infinity

	11:37.600 --> 11:45.600
	attack which was used during the training. And under the second setting, we varied the

	11:45.600 --> 11:51.120
	epsilon value for the PGDL-infinity attack which was used during the training. And under the second setting, we used a diverse set of attacks and various attack strengths for each of them.

	11:52.000 --> 11:58.480
	And under both scenarios, we found that the domain discriminator, which was originally trained on a

	11:58.480 --> 12:05.280
	particular attack and attack strength, in our case it was PGDL-infinity attack with a fixed epsilon

	12:05.280 --> 12:10.960
	for each data set, could well predict the generalization gap to unseen attacks and

	12:10.960 --> 12:18.000
	different attack magnitudes. This suggests that the adversarial training against a domain classifier

	12:18.000 --> 12:24.000
	like that used in our proposed method could potentially lead to robust models with better

	12:24.000 --> 12:33.520
	generalization capacity. Finally, while we showed that AFD generalizes well to most other attacks

	12:33.520 --> 12:39.200
	and attack strengths, it occasionally was worse compared to other baselines, especially in data

	12:39.200 --> 12:45.760
	sets with more classes like Tiny ImageNet. This could potentially be due to the difficulty of training

	12:46.320 --> 12:51.680
	domain classifiers in these data sets and leaves much space for future work on

	12:51.680 --> 12:57.120
	investigating the effect of domain classifiers on the robustness of feature learning functions.

	12:58.080 --> 13:04.400
	Also, AFD required more backward computations compared to some of the other baselines

	13:04.400 --> 13:11.120
	such as adversarial training, and as a result, its training time was on average about 31%

	13:11.120 --> 13:17.680
	longer than adversarial training. We invite you to read our paper for more details and please

	13:17.680 --> 13:34.720
	get in touch with us if you have any questions. Thanks for watching this video and we hope you enjoyed it.