Angel commited on
Commit
5373d2a
·
1 Parent(s): 8b78e1d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -13,6 +13,30 @@ The process of fetching new YouTube videos and extracting their transcripts is a
13
  - **Retrieval-Augmented Generation (RAG)**: The bot uses RAG to query the AI and retrieve information from video transcripts to answer user queries.
14
  - **Bot Interaction**: A chatbot interface answers questions based on the YouTube video transcripts.
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ## Project Structure
17
 
18
  ```bash
 
13
  - **Retrieval-Augmented Generation (RAG)**: The bot uses RAG to query the AI and retrieve information from video transcripts to answer user queries.
14
  - **Bot Interaction**: A chatbot interface answers questions based on the YouTube video transcripts.
15
 
16
+ ## How vector DB works
17
+ First Check for Vector DB:
18
+ Tries to get existing collection named "transcript_collection"
19
+ If not found, creates a new one
20
+ If found, uses the existing one
21
+ Document Comparison:
22
+ Gets all existing documents from the database
23
+ Takes your new text chunks
24
+ Compares them to find which chunks are new (not in database)
25
+ Processing New Content:
26
+ If no new content is found → stops (nothing to do)
27
+ If new content exists → only generates embeddings for these new chunks
28
+ Update Database:
29
+ Takes the new embeddings
30
+ Adds them to the existing vector database
31
+ Maintains all previous data while adding new content
32
+ So if you have:
33
+
34
+ Original DB with chunks A, B, C
35
+ New text with chunks A, B, C, D, E
36
+ It will only process and add D and E to the database
37
+ This makes the process much more efficient since you're not reprocessing content that's already in the database!
38
+
39
+
40
  ## Project Structure
41
 
42
  ```bash