Spaces:

retkowski
/

ytseg_demo

Running

File size: 8,470 Bytes

cb71ef5

WEBVTT

00:00.000 --> 00:14.520
Hi, my name is Maxwell Nye, and today I'll be talking about improving coherence and consistency

00:14.520 --> 00:19.620
in neural sequence models with dual system neurosymbolic reasoning.

00:19.620 --> 00:23.800
So I first want to give a little bit of a demo, which is to ask this question.

00:23.800 --> 00:26.920
A bat and a ball cost $1.10 in total.

00:26.920 --> 00:29.300
The bat costs $1 more than the ball.

00:29.300 --> 00:31.720
How much does the ball cost?

00:31.720 --> 00:34.920
So I'll let you think a little bit for this.

00:34.920 --> 00:39.200
So one answer that sort of might jump out at you is $0.10, but this is actually incorrect

00:39.200 --> 00:43.920
because the sum of the two objects should be $1.10.

00:43.920 --> 00:46.880
So the correct answer is actually $0.05.

00:46.880 --> 00:54.240
And this is an example from a cognitive reflection test, and these are questions designed to

00:54.240 --> 01:00.140
have a particular answer which comes to mind quite quickly, which is in fact wrong.

01:00.140 --> 01:06.640
And something that's interesting is that large-scale language models such as GPT-3 predict the

01:06.640 --> 01:08.320
wrong answers as well.

01:08.320 --> 01:11.300
And this is true not just for the sort of the classic cognitive reflection test, but

01:11.300 --> 01:15.160
also for variants with different numbers.

01:15.160 --> 01:19.680
So this is sort of an interesting thing.

01:19.680 --> 01:27.400
It talks about how neural language models often have issues with consistency and coherence.

01:27.400 --> 01:30.720
So another place that we can see this a little more concretely is the clutter data set.

01:30.720 --> 01:36.680
In the clutter data set, models are trained to...

01:36.680 --> 01:42.080
There are sentences about people and their family relationships and stories about those

01:42.080 --> 01:43.840
people.

01:43.840 --> 01:48.800
And this was originally devised as a question-answering data set where you ask what the relations

01:48.800 --> 01:49.800
are.

01:49.800 --> 01:58.080
One thing you can do is ask models to be trained on this data set and then generate new stories.

01:58.080 --> 02:02.880
And when you do that, you'll see that often the generated stories have inconsistency.

02:02.880 --> 02:06.560
So if we look at the bottom of the screen here, we can see an example of this.

02:06.560 --> 02:10.080
Robert and his brother Antonio played harmonicas together.

02:10.080 --> 02:13.440
Robert's daughter, Elsie, asked him to play with her.

02:13.440 --> 02:17.280
Elsie doesn't like having to babysit her younger brother, Antonio.

02:17.280 --> 02:21.240
And so we can see that this is a common sense error because Elsie is not the younger brother

02:21.240 --> 02:22.240
of Antonio.

02:22.240 --> 02:27.720
Or Elsie's younger brother is not Antonio.

02:27.720 --> 02:35.760
So what we've done is we've built a dual system model using large-scale neural networks and

02:35.760 --> 02:42.800
symbolic deliberative logic in order to try to help with these consistency issues.

02:42.800 --> 02:44.400
So the model is as follows.

02:44.400 --> 02:52.680
You use neural generation to generate sentences in a particular story.

02:52.680 --> 02:59.360
You might generate the next sentence using a model such as GPT-3 or BART.

02:59.360 --> 03:10.320
What you can then do is parse that sentence into the semantic meaning with respect to

03:10.320 --> 03:15.520
the family relationships and check whether or not it matches the current state of the

03:15.520 --> 03:20.960
family relationships that's been described so far, and only accept the candidate sentence

03:20.960 --> 03:25.800
generations that are actually consistent.

03:25.800 --> 03:27.600
So this has a few components.

03:27.600 --> 03:30.380
One of the components here is a symbolic world model.

03:30.380 --> 03:35.160
In the case of this clutter domain, the symbolic world model that we built encodes people and

03:35.160 --> 03:36.160
their family relationships.

03:36.160 --> 03:42.840
So in other words, you could take a sentence and encode what the underlying family relationship

03:42.840 --> 03:43.840
is.

03:43.840 --> 03:50.680
And what you can do is you can use SMT solvers such as the Z3 solver to check consistency.

03:50.680 --> 03:57.240
So given a new sentence, you can check that it doesn't disobey the rules of ancestry that

03:57.240 --> 03:58.240
we've defined here.

03:58.240 --> 04:04.120
And so some of those are, for example, what is the relationship between children and grandchildren?

04:04.120 --> 04:10.000
And then another is what are the rules about whether ancestry, can you be your own ancestor,

04:10.000 --> 04:12.180
et cetera.

04:12.180 --> 04:15.040
So one question is how is this semantic parsing done?

04:15.040 --> 04:19.560
And it turns out we can actually do this quite cheaply using GPT-3.

04:19.560 --> 04:26.920
So what we can see here in the dotted box is an actual example of a few-shot prompt

04:26.920 --> 04:34.440
we can use to parse each new sentence, each new candidate sentence from the system one

04:34.440 --> 04:42.360
generation model and parse it into the semantic form that we can then give to the world model

04:42.360 --> 04:46.280
solver.

04:46.280 --> 04:52.120
So the results here show that models that use this dual system neurosymbolic stories

04:52.120 --> 05:02.160
show improved coherence over just sentences that were constructed by a neural model.

05:02.160 --> 05:10.160
So the example here is that what we've done is we've used human judgments on which of

05:10.160 --> 05:14.800
the following sentences make more sense given the prior context of the story.

05:14.800 --> 05:25.280
And we see that if we use a symbolic world model and the parsing scheme described above,

05:25.280 --> 05:32.520
humans prefer the judgments given by this model.

05:32.520 --> 05:36.360
We can also apply the same sort of reasoning to a completely different task.

05:36.360 --> 05:42.080
Here we can discuss the grounded instruction following task, the grounded instruction following

05:42.080 --> 05:44.020
domain called gscan.

05:44.020 --> 05:49.360
In this domain, the goal is to have an agent, which is shown by this pink triangle, follow

05:49.360 --> 05:53.240
a command to perform some simple action in this grid world.

05:53.240 --> 06:00.520
So you can see here, walk to a small yellow cylinder might be an example of a command.

06:00.520 --> 06:06.800
Prior work has shown that one thing you can do is encode the initial state, encode the

06:06.800 --> 06:14.280
instruction and then train a neural model to predict the action sequences.

06:14.280 --> 06:19.600
Other work has also shown that one thing you can do is train a model to predict a distribution

06:19.600 --> 06:25.200
over the correct target location as part of the neural model.

06:25.200 --> 06:29.600
That will also increase the performance of the model.

06:29.600 --> 06:38.400
What we do here is show that if you do both of these things, you predict both an action

06:38.400 --> 06:43.800
sequence and a target location, like what is the location you should end up in, and

06:43.800 --> 06:48.600
then check whether or not when you execute the set of instructions, you will end up in

06:48.600 --> 06:50.720
the predicted target location.

06:50.720 --> 06:57.800
You can sort of check consistency between these two different predictions and only accept

06:57.800 --> 07:06.560
those instruction sequences which match the target location prediction.

07:06.560 --> 07:14.700
And this leads to also higher accuracy, especially in a low data regime.

07:14.700 --> 07:18.320
We have more details about the results of the paper.

07:18.320 --> 07:21.160
So that's a little bit of an overview of our paper.

07:21.160 --> 07:24.520
Our takeaways are that you can build systems with combined neural methods and explicit

07:24.520 --> 07:25.560
world knowledge.

07:25.560 --> 07:28.880
And if you add just a little bit of world knowledge, you can really help increase coherence

07:28.880 --> 07:34.880
and consistency for these large sequence models.

07:34.880 --> 07:38.520
There are some challenges here about parsing in larger scale domains and also what it would

07:38.520 --> 07:41.360
mean to automatically build a more complete world model.

07:41.360 --> 08:01.360
Thank you very much.