ytseg_demo / demo_data /nips-2021 /25973 /transcript_whisper_large-v2.vtt
retkowski's picture
Add demo
cb71ef5
WEBVTT
00:00.000 --> 00:13.440
Hi everyone, I'm Jingwen, a PhD student in National University of Singapore.
00:13.440 --> 00:21.040
In this paper, we introduce dual-aspect collaborative transformer for solving routine problems.
00:21.040 --> 00:25.280
Until now, the neural solvers for VRPs could be classified in two types.
00:25.280 --> 00:27.680
The first one is the neural construction solver.
00:27.680 --> 00:34.480
It starts from an empty solution and iteratively selects a customer node to the solution,
00:34.480 --> 00:36.880
until all customers have been visited.
00:36.880 --> 00:41.200
And in this paper, we focus more on the neural improvement solvers.
00:41.200 --> 00:45.680
It starts from an incomplete solution and iteratively improves the solution
00:45.680 --> 00:50.480
based on the node features and solution features, until reaching a step limit T.
00:52.160 --> 00:56.800
Although the transformer has shown the efficiency for processing the sequence data,
00:56.800 --> 01:01.600
its positional encoding method may not be optimal for encoding the VRP solutions,
01:01.600 --> 01:06.640
because it only learns a unified set of embeddings and combines the node embeddings
01:06.640 --> 01:08.320
and the positional embeddings together.
01:09.440 --> 01:12.720
Also, it can only encode the linear sequences,
01:12.720 --> 01:17.200
which cannot capture the circularity and symmetry of VRP solutions.
01:17.200 --> 01:21.040
So in this paper, we introduce the dual-aspect augmentation,
01:21.040 --> 01:24.320
which could better describe the VRP solutions.
01:24.320 --> 01:29.280
We separate the learnings to node feature embeddings and positional feature embeddings
01:29.280 --> 01:31.920
based on the cross-aspect referential attention.
01:32.480 --> 01:38.000
And in this table, we compare the performance of dual-aspect and single-aspect.
01:38.000 --> 01:41.360
We can see the dual-aspect outperforms the single-aspect.
01:41.360 --> 01:44.720
And here we introduce the cyclic positional encoding.
01:44.720 --> 01:50.400
In this figure, we describe the embedding vectors and correlations between every two embeddings
01:50.400 --> 01:55.360
of the original PE and our CPE method in subfeature A and B.
01:55.360 --> 02:01.760
In subfeature C, we describe the top two principal components after PCA projection.
02:01.760 --> 02:07.600
And we can see our PCA method can better capture the circularity of VRP solutions.
02:09.120 --> 02:13.840
And here we did some ablation studies on the CPE method,
02:13.840 --> 02:16.800
which can achieve better generalization performance.
02:16.800 --> 02:22.320
And now we introduce our curriculum learning strategy in the training process.
02:23.280 --> 02:29.440
And in this method, we're training with an unstepped PPO method and a curriculum learning strategy.
02:30.080 --> 02:36.080
It gradually prescribes higher quality solutions as the initial stage for training.
02:36.080 --> 02:39.040
And in this graph, we describe two curves.
02:39.680 --> 02:45.440
The blue one is the PPO method only, and the green one is the PPO method only.
02:45.440 --> 02:50.080
And the green one is the PPO method with our curriculum learning strategy.
02:50.080 --> 02:54.880
And we can see the green one is more stable and achieves lower objective values.
02:56.800 --> 03:04.320
And here is the comparison performance of our method and some baselines on both TST and CVRP.
03:04.320 --> 03:10.480
We can see our DACT outperforms the existing transformer-based improvement models.
03:10.480 --> 03:17.040
So, based on these experiments, we can see our DACT performs very well for the routing problems.
03:17.040 --> 03:23.520
And in the future, we hope to use this method to solve more combinatorial optimization problems.
03:23.520 --> 03:41.520
Thank you.