Ahmadzei's picture
update 1
57bdca5
raw
history blame contribute delete
309 Bytes
So for each query q in Q, we can consider only
the keys k in K that are close to q. A hash function is used to determine if q and k are close. The attention mask is
modified to mask the current token (except at the first position), because it will give a query and a key equal (so
very similar to each other).