# Synthetic Data Generation


In [1]:
import json
import sys
import csv
sys.path.append('..')


import tinytroupe
from tinytroupe.agent import TinyPerson
from tinytroupe.environment import TinyWorld, TinySocialNetwork
from tinytroupe.factory import TinyPersonFactory
from tinytroupe.extraction import ResultsReducer
import tinytroupe.control as control


!!!!
DISCLAIMER: TinyTroupe relies on Artificial Intelligence (AI) models to generate content. 
The AI models are not perfect and may produce inappropriate or inacurate results. 
For any serious or consequential use, please review the generated content before using it.
!!!!

Looking for default config on: c:\Users\pdasilva\OneDrive - Microsoft\Git repositories\tinytroupe-opensource\TinyTroupe\examples\..\tinytroupe\utils\..\config.ini
Found custom config on: c:\Users\pdasilva\OneDrive - Microsoft\Git repositories\tinytroupe-opensource\TinyTroupe\examples\config.ini

Current TinyTroupe configuration 
[OpenAI]
api_type = openai
azure_api_version = 2024-08-01-preview
model = gpt-4o-mini
max_tokens = 4000
temperature = 1.5
freq_penalty = 1.5
presence_penalty = 1.0
timeout = 60
max_attempts = 5
waiting_time = 2
exponential_backoff_factor = 5
embedding_model = text-embedding-3-small
cache_api_calls = False
cache_file_name = openai_api_cache.pickle
max_content_display_length = 1024
azure_emb

Let's create the specific types of agents we need to collect data.

In [2]:
factory = TinyPersonFactory("A random knowledge worker in a company providing marketing services.")

In [3]:
people = []
for i in range(2):
    person = factory.generate_person(temperature=1.6)
    print(person.minibio())
    people.append(person)

len(people)

Clara Thompson is a 32 year old Marketing Specialist, American, currently living in Austin, Texas, USA. Clara Thompson is a creative and empathetic individual who thrives in collaborative environments, often seeking feedback from colleagues to enhance her work. She has a strong interest in digital marketing trends and enjoys attending workshops that allow her to network with other professionals. Outside of work, Clara finds joy in photography, capturing moments during her travels or hikes with friends and family. To maintain balance amidst the pressures of tight deadlines, she practices yoga and mindfulness techniques that help manage stress while fostering personal growth through continuous learning.
Liam Carter is a 29 year old Digital Marketing Executive, British, currently living in Manchester, England. Liam is a creative individual who thrives on brainstorming sessions and values collaboration with his colleagues. He has a keen interest in digital marketing trends and enjoys explo

2

In [4]:
company = TinyWorld("Some Corp Inc.", people)

In [5]:
company.make_everyone_accessible()

In [6]:
company.broadcast("Get some work done together, help each other.")

In [7]:
company.run(5)

We can now extract the conversations, which form the synthetic corpus we wanted.

In [8]:
people[0].pp_current_interactions()

In [9]:
reducer = ResultsReducer()

def aux_extract_content(focus_agent: TinyPerson, source_agent:TinyPerson, target_agent:TinyPerson, kind:str, event: str, content: str, timestamp:str):

    if event == "TALK":
        author = focus_agent.name
    elif event == "CONVERSATION":
        if source_agent is None:
            author = "USER"
        else:
            author = source_agent.name
    else:
        raise ValueError(f"Unknown event: {event}")
    
    
    entry = (author, content)
    print(entry)
    return entry
    


reducer.add_reduction_rule("TALK", aux_extract_content)
reducer.add_reduction_rule("CONVERSATION", aux_extract_content)

Finally, we obtain the dataframe with the data and save it to a `.csv`, for later use in other applications.

In [10]:
df = reducer.reduce_agent_to_dataframe(people[0], column_names=["author", "content"])
df

('USER', 'Get some work done together, help each other.')
('Clara Thompson', "That sounds like a great idea! I'm always up for collaborating. What specific tasks do you have in mind?")
('Liam Carter', 'I think we could work on analyzing our recent campaign metrics together. It would be great to get your insights! What do you think?')
('Clara Thompson', "I think that's a fantastic idea, Liam! I have some thoughts on the metrics we should focus on. Which specific areas do you want to analyze first?")
('Liam Carter', 'I think we should start by looking at the engagement rates first. They can give us a good idea of how our audience is responding to the content. After that, we could dive into conversion metrics. What do you think?')
('Clara Thompson', "I completely agree, Liam! Let's start with the engagement rates. I can pull up the data we have and analyze it together. After that, we can move on to conversion metrics.")
('Liam Carter', 'Great! Once you have the data, we can look for trend

Unnamed: 0,author,content
0,USER,"Get some work done together, help each other."
1,Clara Thompson,That sounds like a great idea! I'm always up f...
2,Liam Carter,I think we could work on analyzing our recent ...
3,Clara Thompson,"I think that's a fantastic idea, Liam! I have ..."
4,Liam Carter,I think we should start by looking at the enga...
5,Clara Thompson,"I completely agree, Liam! Let's start with the..."
6,Liam Carter,"Great! Once you have the data, we can look for..."
7,Clara Thompson,"Absolutely, Liam! I’ll analyze the data for tr..."
8,Liam Carter,I'm looking forward to seeing what you find! M...
9,Clara Thompson,I think we could enhance our content by focusi...


In [11]:
df.to_csv("../data/extractions/synthetic_data_generation.out.csv", index=False)