This Notebook is a Stable-diffusion tool which allows you to find similiar tokens from the SD 1.5 vocab.json that you can use for text-to-image generation. Try this Free online SD 1.5 generator with the results: https://perchance.org/fusion-ai-image-generator

In [None]:
# @title Load/initialize values
# Load the tokens into the colab
!git clone https://huggingface.co/datasets/codeShare/sd_tokens
import torch
from torch import linalg as LA
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
%cd /content/sd_tokens
token = torch.load('sd15_tensors.pt', map_location=device, weights_only=True)
#-----#

#Import the vocab.json
import json
import pandas as pd
with open('vocab.json', 'r') as f:
    data = json.load(f)

_df = pd.DataFrame({'count': data})['count']

vocab = {
    value: key for key, value in _df.items()
}
#-----#

# Define functions/constants
NUM_TOKENS = 49407

def absolute_value(x):
    return max(x, -x)


def token_similarity(A, B):
  #Tensor vector length (2nd order, i.e (a^2 + b^2 + ....)^(1/2)
  _A = LA.vector_norm(A, ord=2)
  _B = LA.vector_norm(B, ord=2)
  #----#
  result = torch.dot(A,B)/(_A*_B)
  #similarity_pcnt = absolute_value(result.item()*100)
  similarity_pcnt = result.item()*100
  similarity_pcnt_aprox = round(similarity_pcnt, 3)
  result = f'{similarity_pcnt_aprox} %'
  return result

def similarity(id_A , id_B):
  #Tensors
  A = token[id_A]
  B = token[id_B]
  return token_similarity(A, B)
#----#

#print(vocab[8922]) #the vocab item for ID 8922
#print(token[8922].shape)  #dimension of the token

mix_with = ""
mix_method = "None"

In [None]:
# @title üìù -> üÜî Tokenize prompt into IDs
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("openai/clip-vit-large-patch14", clean_up_tokenization_spaces = False)

prompt= "banana" # @param {type:'string'}

tokenizer_output = tokenizer(text = prompt)
input_ids = tokenizer_output['input_ids']
print(input_ids)


#The prompt will be enclosed with the <|start-of-text|> and <|end-of-text|> tokens, which is why output will be [49406, ... , 49407].

#You can leave the 'prompt' field empty to get a random value tensor. Since the tensor is random value, it will not correspond to any tensor in the vocab.json list , and this it will have no ID.

In [None]:
# @title üÜî->ü•¢ Take the ID at index 1 from above result and get its corresponding tensor value

id_A = input_ids[1]
A = token[id_A]
_A = LA.vector_norm(A, ord=2)

#if no imput exists we just randomize the entire thing
if (prompt == ""):
  id_A = -1
  print("Tokenized prompt tensor A is a random valued tensor with no ID")
  R = torch.rand(768)
  _R =  LA.vector_norm(R, ord=2)
  A = R*(_A/_R)

#Save a copy of the tensor A
id_P = id_A
P = A
_P = LA.vector_norm(A, ord=2)


In [None]:
# @title ü•¢ -> ü•¢üîÄ Take the ID at index 1 from above result and modify it (optional)
mix_with = "" # @param {type:'string'}
mix_method = "None" # @param ["None" , "Average", "Subtract"] {allow-input: true}
w = 0.5 # @param {type:"slider", min:0, max:1, step:0.01}

#------#
#If set to TRUE , this will use the output of this cell , tensor A, as the input of this cell the 2nd time we run it. Use this feature to mix many tokens into A
re_iterate_tensor_A = True # @param {"type":"boolean"}
if (re_iterate_tensor_A == False) :
  #prevent re-iterating A by reading from stored copy
  id_A = id_P
  A = P
  _A = _P
#----#

tokenizer_output = tokenizer(text = mix_with)
input_ids = tokenizer_output['input_ids']
id_C = input_ids[1]
C = token[id_C]
_C = LA.vector_norm(C, ord=2)

#if no imput exists we just randomize the entire thing
if (mix_with == ""):
  id_C = -1
  print("Tokenized prompt  'mix_with' tensor C is a random valued tensor with no ID")
  R = torch.rand(768)
  _R =  LA.vector_norm(R, ord=2)
  C = R*(_C/_R)

if (mix_method ==  "None"):
  print("No operation")

if (mix_method ==  "Average"):
  A = w*A + (1-w)*C
  _A = LA.vector_norm(A, ord=2)
  print("Tokenized prompt tensor A has been recalculated as A = w*A + (1-w)*C , where C is the tokenized prompt  'mix_with' tensor C")

if (mix_method ==  "Subtract"):
  tmp = (A/_A) - (C/_C)
  _tmp = LA.vector_norm(tmp, ord=2)
  A = tmp*((w*_A + (1-w)*_C)/_tmp)
  _A = LA.vector_norm(A, ord=2)
  print("Tokenized prompt tensor A has been recalculated as A = (w*_A + (1-w)*_C) * norm(w*A - (1-w)*C) , where C is the tokenized prompt 'mix_with' tensor C")

#OPTIONAL : Add/subtract + normalize above result with another token. Leave field empty to get a random value tensor

In [None]:

# @title ü•¢->üßæü•¢ Find Similiar Tokens to ID at index 1 from above result
dots = torch.zeros(NUM_TOKENS)
for index in range(NUM_TOKENS):
  id_B = index
  B = token[id_B]
  _B = LA.vector_norm(B, ord=2)
  result = torch.dot(A,B)/(_A*_B)
  #result = absolute_value(result.item())
  result = result.item()
  dots[index] = result

name_A = "A of random type"
if (id_A>-1):
  name_A = vocab[id_A]

name_C = "token C of random type"
if (id_C>-1):
  name_C = vocab[id_C]


sorted, indices = torch.sort(dots,dim=0 , descending=True)
#----#
if (mix_method ==  "Average"):
  print(f'Calculated all cosine-similarities between the average of token {name_A} and {name_C} with Id_A = {id_A} and mixed Id_C = {id_C} as a 1x{sorted.shape[0]} tensor')
if (mix_method ==  "Subtract"):
  print(f'Calculated all cosine-similarities between the subtract of token {name_A} and {name_C} with Id_A = {id_A} and mixed Id_C = {id_C} as a 1x{sorted.shape[0]} tensor')
if (mix_method ==  "None"):
  print(f'Calculated all cosine-similarities between the token {name_A} with Id_A = {id_A} with the the rest of the {NUM_TOKENS} tokens as a 1x{sorted.shape[0]} tensor')

#Produce a list id IDs that are most similiar to the prompt ID at positiion 1 based on above result

In [None]:
# @title ü•¢üßæ -> üñ®Ô∏è Print Result from the 'Similiar Tokens' list from above result
list_size = 100 # @param {type:'number'}
print_ID = False # @param {type:"boolean"}
print_Similarity = True # @param {type:"boolean"}
print_Name = True # @param {type:"boolean"}
print_Divider = True # @param {type:"boolean"}

for index in range(list_size):
  id = indices[index].item()
  if (print_Name):
    print(f'{vocab[id]}') # vocab item
  if (print_ID):
    print(f'ID = {id}') # IDs
  if (print_Similarity):
    print(f'similiarity = {round(sorted[index].item()*100,2)} %') # % value
  if (print_Divider):
    print('--------')

#Print the sorted list from above result

In [None]:

# @title üÜî Get similarity % of two token IDs
id_for_token_A = 4567 # @param {type:'number'}
id_for_token_B = 4343 # @param {type:'number'}

similarity_str =  'The similarity between tokens A and B is ' + similarity(id_for_token_A , id_for_token_B)

print(similarity_str)

#Valid ID ranges for id_for_token_A / id_for_token_B are between 0 and 49407

In [None]:
# @title üí´ Compare Text encodings

prompt_A = "" # @param {"type":"string","placeholder":"Write a prompt"}
prompt_B = "" # @param {"type":"string","placeholder":"Write a prompt"}
use_token_padding = False # @param {type:"boolean"}

from transformers import  CLIPProcessor, CLIPModel


processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14" , clean_up_tokenization_spaces = True)

model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")

ids_A = processor.tokenizer(text=prompt_A, padding=use_token_padding, return_tensors="pt")
text_encoding_A = model.get_text_features(**ids_A)

ids_B = processor.tokenizer(text=prompt_B, padding=use_token_padding, return_tensors="pt")
text_encoding_B = model.get_text_features(**ids_B)

similarity_str =  'The similarity between the text_encoding for A and B is ' +  token_similarity(text_encoding_A[0] , text_encoding_B[0])


print(similarity_str)
#outputs = model(**inputs)
#logits_per_image = outputs.logits_per_image # this is the image-text similarity score
#probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities







This is how the notebook works:

Similiar vectors = similiar output in the SD 1.5 / SDXL / FLUX model

CLIP converts the prompt text to vectors (‚Äútensors‚Äù) , with float32 values usually ranging from -1 to 1

Dimensions are [ 1x768 ] tensors for SD 1.5 , and a [ 1x768 , 1x1024 ] tensor for SDXL and FLUX.

The SD models and FLUX converts these vectors to an image.

This notebook takes an input string , tokenizes it and matches the first token against the 49407 token vectors in the vocab.json : https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main/tokenizer

It finds the ‚Äúmost similiar tokens‚Äù in the list. Similarity is the theta angle between the token vectors.


<div>
<img src="https://huggingface.co/datasets/codeShare/sd_tokens/resolve/main/cosine.jpeg" width="300"/>
</div>

The angle is calculated using cosine similarity , where 1 = 100% similarity (parallell vectors) , and 0 = 0% similarity (perpendicular vectors).

Negative similarity is also possible.

So if you are bored of prompting ‚Äúgirl‚Äù and want something similiar you can run this notebook and use the ‚Äúchick</w>‚Äù token at 21.88% similarity , for example

You can also run a mixed search , like ‚Äúcute+girl‚Äù/2 , where for example ‚Äúkpop</w>‚Äù has a 16.71% similarity

Sidenote: Prompt weights like (banana:1.2) will scale the magnitude of the corresponding 1x768 tensor(s) by 1.2 .

Source: https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts*

So TLDR; vector direction = ‚Äúwhat to generate‚Äù , vector magnitude = ‚Äúprompt weights‚Äù

/---/

Read more about CLIP here: https://huggingface.co/docs/transformers/model_doc/clip