{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## PROGRAMMATIC ACCESS TO DATA MORGANA\n", "\n", "**DataMorgana** is a powerful tool for generating synthetic question-answering data, useful for both evaluating and training question-answering systems.\n", "\n", "If you're using DataMorgana for the first time, it's recommended to start with the [DataMorgana Sandbox](https://platform.ai71.ai/playground). The Sandbox provides an intuitive UI for generating individual question-answer pairs interactively.\n", "\n", "In this notebook, we'll explore how to use the DataMorgana API to generate large-scale synthetic question-answering data on FineWeb.\n", "\n", "For the full API documentation, refer to [this link](https://api.ai71.ai/redoc#tag/Synthetic-Conversations)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import json\n", "import time\n", "from typing import Dict, List\n", "\n", "import requests\n", "\n", "BASE_URL = \"https://api.ai71.ai/v1/\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, ensure that you have an API key for the AI71 platform." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "API_KEY = ''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How to know the remaining budget\n", "\n", "The generation of the data is done using LLMs, which is costly. Therefore, you will have a limited amount of credits - each credit corresponds to a single generated question. \n", "\n", "You can use the `check_budget` endpoint to see the remaining credits for your organization." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def check_budget():\n", " resp = requests.get(\n", " f\"{BASE_URL}check_budget\",\n", " headers={\"Authorization\": f\"Bearer {API_KEY}\"},\n", " )\n", " resp.raise_for_status()\n", " print(json.dumps(resp.json(), indent=4))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"remaining_budget\": 9967\n", "}\n" ] } ], "source": [ "check_budget()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bulk generation of QA pairs\n", "\n", "Now, let's see how to generate questions using the `bulk_generation` endpoint.\n", "\n", "This endpoint accepts three arguments: `n_questions`, `question_categorizations`, and `user_categorizations`.\n", "\n", "Since the endpoint is **asynchronous**, it returns only a `request_id`. To retrieve the generated questions once they are ready, we need to use the `fetch_generation_results` endpoint with the corresponding `request_id`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def bulk_generate(n_questions: int, question_categorizations: List[Dict], user_categorizations: List[Dict]):\n", " resp = requests.post(\n", " f\"{BASE_URL}bulk_generation\",\n", " headers={\"Authorization\": f\"Bearer {API_KEY}\"},\n", " json={\n", " \"n_questions\": n_questions,\n", " \"question_categorizations\": question_categorizations,\n", " \"user_categorizations\": user_categorizations\n", " }\n", " )\n", " resp.raise_for_status()\n", " request_id = resp.json()[\"request_id\"]\n", " print(json.dumps(resp.json(), indent=4))\n", "\n", " result = wait_for_generation_to_finish(request_id)\n", " return result\n", "\n", "\n", "def wait_for_generation_to_finish(request_id: str):\n", " first_print = True\n", " while True:\n", " resp = requests.get(\n", " f\"{BASE_URL}fetch_generation_results\",\n", " headers={\"Authorization\": f\"Bearer {API_KEY}\"},\n", " params={\"request_id\": request_id},\n", " )\n", " resp.raise_for_status()\n", " if resp.json()[\"status\"] == \"completed\":\n", " print('completed')\n", " print(json.dumps(resp.json(), indent=4))\n", " return resp.json()\n", " else:\n", " if first_print:\n", " first_print = False\n", " print(\"Waiting for generation to finish...\", end='')\n", " else:\n", " print('.', end='')\n", " time.sleep(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Definition of User and Question Categorizations\n", "\n", "To call the `bulk_generation` endpoint, we first need to specify the user and question categorizations we want to use. \n", "\n", "When defining categorizations, keep in mind: \n", "\n", "- You can create your own categorizations—these are just examples. \n", "- Each categorization can include as many categories as you like, as long as their probabilities sum to 1. \n", "- The **descriptions** of the categories are injected into the LLM prompt during question generation. To ensure high-quality outputs, it’s important to write them clearly and thoughtfully. \n", "\n", "We encourage you to first try your configurations in the Sandbox before using them to generate a large bulk of questions, to ensure you get the expected results.\n", "\n", "For the competition, you’ll want to evaluate and train your system on a diverse set of questions, since you won’t know in advance what types of questions will appear in the test. \n", "\n", "Keep in mind that the categorizations used in this notebook are just examples and will not correspond to those used to generate the actual test set.\n", "\n", "Let's start by defining a user categorization." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "user_expertise_categorization = {\n", " \"categorization_name\": \"user-expertise\",\n", " \"categories\": [\n", " {\n", " \"name\": \"expert\",\n", " \"description\": \"an expert on the subject discussed in the documents, therefore he asks complex questions.\",\n", " \"probability\": 0.5\n", " },\n", " {\n", " \"name\": \"novice\",\n", " \"description\": \"a person with very basic knowledge on the topic discussed in the topic. Therefore, he asks very simple questions.\",\n", " \"probability\": 0.5\n", " }\n", " ]\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, we can define question categorizations." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "question_formulation_categorization = {\n", " \"categorization_name\": \"question-formulation\",\n", " \"categories\": [\n", " {\n", " \"name\": \"concise and natural\",\n", " \"description\": \"a concise direct natural question consisting of a few words.\",\n", " \"probability\": 0.35\n", " },\n", " {\n", " \"name\": \"verbose and natural\",\n", " \"description\": \"a relatively long question consisting of more than 9 words.\",\n", " \"probability\": 0.35\n", " },\n", " {\n", " \"name\": \"short search query\",\n", " \"description\": (\"phrased as a typed web query for search engines \"\n", " \"only keywords, without punctuation and without a natural-sounding structure).\"\n", " \" It consists of less than 7 words.\"),\n", " \"probability\": 0.15\n", " },\n", " {\n", " \"name\": \"long search query\",\n", " \"description\": (\"phrased as a typed web query for search engines \"\n", " \"only keywords, without punctuation and without a natural-sounding structure).\"\n", " \" It consists of more than 6 words.\"),\n", " \"probability\": 0.15\n", " }\n", " ]\n", "}\n", "\n", "premise_categorization = {\n", " \"categorization_name\": \"premise-categorization\",\n", " \"categories\": [\n", " {\n", " \"name\": \"without premise\",\n", " \"description\": \"a question that does not contain any premise or any information about the user.\",\n", " \"probability\": 0.7\n", " },\n", " {\n", " \"name\": \"with premise\",\n", " \"description\": (\"a question starting with a very short premise, where the users reveal \"\n", " \"their needs or some information about themselves.\"),\n", " \"probability\": 0.3\n", " }\n", " ]\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generating questions from **document pairs**\n", "DataMorgana supports the generation of questions where the information required to answer them is split across two documents.\n", "\n", "To enable this possibility we need to use the `is_multi_doc` field which is applicable to question categories.\n", "\n", "The `is_multi_doc` is by default `false`, and when explicitely set to `true`, it triggers data morgana to use two documents instead of one, while generating a question answer pair.\n", "\n", "Note that the `is_multi_doc` field applies only to question categories, and not to user categories.\n", "\n", "When writing the description for a multi-doc question category, it is important to clearly specify how the two documents are used to create the question.\n", "\n", "Below is an illustrative example of a question categorization containing two question categories that are multi-doc, and one which is not." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "answer_type_categorization = {\n", " \"categorization_name\": \"answer-type\",\n", " \"categories\": [\n", " {\n", " \"name\": \"factoid\",\n", " \"description\": \"a question seeking a specific, concise piece of information or a short fact about a particular subject, such as a name, date, or number.\",\n", " \"probability\": 0.2,\n", " \"is_multi_doc\": False\n", " },\n", " {\n", " \"name\": \"multi-aspect\",\n", " \"description\": (\"A question about two different aspects of the same entity/concept. \"\n", " \"For example: 'What are the advantages of AI-powered diagnostics, and what are the associated risks of bias in medical decision-making?', \"\n", " \"'How do cryptocurrencies enable financial inclusion, and what are the security risks associated with them?'. \"\n", " \"The information required to answer the question needs to come from two documents, \"\n", " \"specifically, the first document must provide information about the first aspect, while the second must provide information about the second aspect.\"),\n", " \"probability\": 0.3,\n", " \"is_multi_doc\": True\n", " },\n", " {\n", " \"name\": \"comparison\",\n", " \"description\": (\"a comparison question that requires comparing two related concepts or entities. \"\n", " \"The comparison must be natural and reasonable, i.e., comparing two entities by a common attribute which is meaningful and relevant to both entities. \"\n", " \"For example: 'Who is older, Glenn Hughes or Ross Lynch?', 'Are Pizhou and Jiujiang in the same province?', \"\n", " \"'Pyotr Ilyich Tchaikovsky and Giuseppe Verdi have this profession in common'. \"\n", " \"The information required to answer the question needs to come from two documents, specifically, \"\n", " \"the first document must provide information about the first entity/concept, while the second must provide information about the second entity/concept.\"),\n", " \"probability\": 0.5,\n", " \"is_multi_doc\": True\n", " }\n", " ]\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calling the bulk_generation method and accessing the results\n", "\n", "After defining the user and question categorizations we plan to use, we can actually call the the `bulk_generation` endpoint.\n", "\n", "For example, let's use the previously defined categorizations to generate 2 question-answer pairs." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"request_id\": \"5d27a4f3-4031-4952-9a86-937e767ad095\",\n", " \"type\": \"async\"\n", "}\n", "Waiting for generation to finish..........completed\n", "{\n", " \"status\": \"completed\",\n", " \"file\": \"https://s3.amazonaws.com/data.aiir/data_morgana/web_api/results_id_a2376f40-3bdd-407e-8c76-0509d36d0629_user_id_430d2246-3067-4662-8ce5-0c29049adf42.jsonl?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIA2UC3AHBF3ZBGDG62%2F20250414%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250414T071348Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEIT%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJIMEYCIQC2vL1mHOpPS2ySz8T7WjQZ8X%2B%2FeqEV71GTmT6KUwto5AIhALOWNPfuk8zNBS8Fxt%2FzdpUnPOGbaIa9v4ZL4tnWRM%2FcKsMFCP3%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEQABoMNzMwMzM1Mjk1NTYzIgyMt92WGbeylBLY%2BuYqlwU9BRf0SY1xaBr%2Fs%2FNxsTMa9MfykI%2BhDRLsvXrmfuiuCXmZZol1ocW8recxFWibFqBouhYPsFAkfmj5mB5jmXWAVlsKQOkFpM7ejJxxIvIKaIMO991disiSy%2FxsxNrzFMcRh3iQxML%2BIx0z1AHVkD1x%2BAL%2BvIlABJiDMllC1j649jdbSlMzVvsZEXLWUrI2IJGk4Zw3HE6iZezF0QuUVM6%2B70FOq5ae6SXsJRoO%2FRri2uZ1FHZeuk2zusunxbrH4xwZt%2FNcUtNS7D0%2F7Auen63rvjhHOPQskgbOJeWyxuhvbNBDPpXfba0lUErx495QuIN0G998jb1H4zC48Jc8FGVPeQLeak4mTzMWjYWVd958AIcM27KaACgzxr%2FIM0QJI4Wmm9MtLxumVEofrXzqHTAtFKKV8x7YWKEFkFQhOG%2Bw6An8F2msSLOBo1%2FFYi13n2GrleesTmfm6mFBr7inqeMgS6NGTafzrX2WRgzcaWbfSdV4jf0%2BDBT0TPfDAgBo%2B8KUNmEO%2FgpLjjnx%2Fbk3E6zxWIXssOe84b8jWzwfR3VhcuR1X5f%2FeariWyyV1oZwknu4HFZyCt%2FIyWtq9MCmj7gK%2BJ0lTGgu7YZjtHqkr9m2abvb7gocKjmvfbNQj55PPkppGX5f21Sw7DirzCIqrmPQxYP3jt%2FNuRPw7RQxAiPz9otXjn6AOf9zAqgn1icTzCX1H25SxFO%2Fx1YLdUqg4QDMkRFdOYdFR04J5VyV98BwhfxY8i8iDrOqVYkIiVN%2Fkkb1gpbgR345RnJ8gmpeTBua9gBoEpr1koyfy3L7nZbfZV%2Bi%2Bxy5YEVo%2F6%2BS6FPSRkD6Fi53NbTX0qfIsHc0DWKqhlecPTvPfFjdlDJDXPiJTiJzv5jsRAow8IjyvwY6sAE766TC1IvSczg0cafyoEapmtAAJJYsfJXuj2blT8oFemSgmJ1iSXo6Qoa2C%2FONTdDy4jxZt4tnZcgSMhlDAdScWX12elU%2BIL7Ql%2B04U%2BKfqJvqFJ1%2BbmeDvewqFhqN4Vne28LZrTtzUpJrRxYqqWSkpBvN%2FNmT1RA%2FJd%2BLn5ROuBoic%2BCVg02%2BZ45CWMyLEu51i%2BeBWx%2B%2F7R3p1m3i5LL94V1EaeHml55tScHYszv99w%3D%3D&X-Amz-Signature=4b4a34a6238ddb7e578fdbf80104bbbf6ec9a9ce5a8ea81dfa5ab4f81b216029\",\n", " \"credits\": 2\n", "}\n" ] } ], "source": [ "results = bulk_generate(n_questions=2,\n", " question_categorizations=[question_formulation_categorization, premise_categorization, answer_type_categorization],\n", " user_categorizations=[user_expertise_categorization]\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The API response includes a link to the file where the generated results are stored. \n", "This link is valid for 12 hours. If you need to access the file after that time, you can call `fetch_generation_results` again with the original `request_id` to receive a new link.\n", "\n", "Let's retrieve the data from the file and print the results." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "response = requests.get(results[\"file\"])\n", "qa_pairs = [json.loads(line) for line in response.text.splitlines()]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'question': 'How do the heroes Aeneas and Jason compare in their relationships with foreign populations?',\n", " 'answer': 'Aeneas and Jason represent different models of interaction with foreign populations. Aeneas arrives as a refugee seeking to establish a new civilization in Italy, where he must deal with local populations through both conflict and alliance, ultimately leading to a mixed Roman-Italian people as decreed by Jupiter and Juno. Jason and the Argonauts, on the other hand, represent Greek colonial interaction with native populations, as shown in their visits to places like Cyzicus, Heraclea Pontica, and Cyrene. While their actions create models of Greek-native interaction, these encounters often result in Greeks either absorbing local customs or suppressing them, reflecting a more colonial approach to foreign relations.',\n", " 'context': ['by Robin Mitchell-Boyask, Temple University, with sections adapted from Jim O\\'Hara, UNC Chapel Hill\\nAs you read Vergil, try to notice which scenes or characters have been adapted from which part of Homer, and similarly for Apollonius\\' influence.\\nYour Mandelbaum trans. has more verses than the original; the numbers atthe side of the page are his verses; at the top of the page are the line numbers in the Latin. On this sheet I use M\\'s numbers.\\nThe Aeneid tells the story of how a band of refugees from Troy found a new civilization in Italy which leads to the Roman empire. Vergil, a highly learned poet, draws on the full range of mythological and literary traditions to represent this process. Books 1-6 recall (and transform) the Odyssey, while 7-12 the Iliad.\\nConcentrate on Books 1, 2, 4, 6, 7, 12.\\nIn Book 1, what \"is\" Juno? What is her role; with what principles/ideas is she associated? Are these gods \"real\"? Notice anything in Book 1 that involves control of anger/violence/disorder.\\nCf. Aeneas\\' speech at 1.276ff with Jupiter\\'s at 1.357ff. What is the purpose of each speech? Who is the audience?\\nWhat is Aeneas like? What kind of hero? What is he doing when you first see him? Vergil invites us to see Aeneas as Odysseus, often in situations but in profoundly different ways. Aeneas has a mission stated in the proem and announced by Jupiter to Venus. Fighting against this mission is Juno, still angry at the Trojans. Is Aeneas up to the challenge?\\nAfter meeting his mother, Aeneas arrives at Carthage to find, like Odysseus, that he has become a legend, represented here in heroic murals. What odyssean characters does Dido resemble?\\nWhat themes and images are prominent in Book 2 (the Fall of Troy)? Why does Troy fall?\\nNote the Greek treachery, Trojan gullibility and the fate of Laocoon.\\nWhat does Aeneas want to do as Troy falls? Why can\\'t he?\\nHow is Aeneas\\' wife Creusa lost? Note his devotion to his family. How is A. like Hector?\\nWhat does Aeneas want to do in Book 3? Why does Vergil have him visit the \"Toy Troy\" at 3.446ff.?\\nBook 4 is one section where Apollonius is an important influence. If this section is somehow modeled on Book 3 of the Argononautica, then how does this knowledge affect your reading of Aeneas?\\nBe prepared to discuss the following views of Dido at some length.\\nWhat evidence in the text or what arguments can be used to support the following views of what\\'s going on in Aeneid 4?\\n4.274ff How does Iarbas, Dido\\'s rejected suitor, characterize Aeneas when he prays to Jupiter? Why does Vergil have him say this?\\nCompare Dido\\'s curse for Aeneas to Jupiter\\'s prophecy in Book 1, and what Anchises says in the underworld in Book 6.\\nThe funeral games in Book 5 may seem a little slow; it\\'s OK to skim the boat race.\\nWhat does Aeneas learn about the future in the underworld.\\nHow is Aeneas\\' katabasis similar to and different from Odysseus\\'\\nWhat impact does Dido\\'s snub of Aeneas have on you and him? Does it remind you of anything?\\nWhat is Anchises\\' message for Aeneas? How does history \"work\" for Rome? What does Anchises say are Rome\\'s \"arts\"?\\nThe Marcellus whose death is lamented near the end of Book 6 is the nephew and heir of Augustus, Vergil\\'s patron. What is the impact of the lamentation for his death here?\\nAt the end of the book, why does Aeneas exit throught the gate of false dream or false shades?\\nHow does the war in Italy start, and who\\'s at fault?\\nWhat is Turnus like? How is he introduced? Is he admirable, sympathetic? What happens to him in the Allecto scene, and will he be responsible for his own actions? What is Italy like before the arrival of Aeneas? Look both at what Vergil says, and what characters (Latinus, Evander) say. How is Italy\\'s past like its future? What do Latinus and Evander say about the age of Saturn, and what came after it?\\nThe end of Book 7 is a longish list, in which only the first couple and last couple of figures will be significant. Pay attention when Aeneas or Turnus interprets omens. What is Vergil doing?\\nWhen Aeneas visits Evander, he\\'s on the future site of what?\\nWhat is the point of the story about Hercules and Cacus in Book 8?\\nBook 8 will end with a description of Aeneas\\' shield, on which is depicted another version of the Roman future. How easy is it to tell the good guys from bad guys in 1) the Hercules-Cacus story 2) the shield 3) the rest of the poem?\\nWhat is the reader\\'s reaction to the Nisus and Euryalus story in the middle of Book 9?\\nWhat is the point of Numanus\\' speech at 9.798ff.(M\\'s numbers)? What does he think of the Trojans, and why does Vergil have him say this?\\nIn Books 7-12, drawing upon both your extensive knowledge of Homer\\'s Iliad and Odyssey, and what the characters themselves say, try to figure out what characters in this war are reprising roles from Homer: Who plays the role of Greeks, Trojans, Paris, Helen, Priam, Hector, Patroclus, Achilles? Do the roles ever change?\\nIn Book 10, notice how and why Turnus kills Pallas, and how Aeneas reacts, and how Aeneas kills Lausus. How are Aeneas and Turnus alike or different?\\nWhat happens in this poem to characters about your age? Why?\\nWhat\\'s carved on Pallas\\' belt? In what two scenes is this mentioned?\\nWhat does Aeneas say to his son Ascanius in Book 12 lines 586ff. (his only direct words to him quoted in the poem)?\\nWhat are we to think of what happens to Mezentius at the end of Book 10 and Camilla at the end of 11? Are these sympathetic characters, or not?\\nWhat takes place in the agreement between Jupiter and Juno near the end of Book 12? Is this what we expected? Compare this conversation to that between Jupiter and Venus in 1. Who will be the ancestors of the Roman readers of the poem? In what proportion? How might this be a good thing, or a bad thing?\\nWhat is happening to the main characters during this book?\\nWhat function does Juturna serve?\\nEnd of the poem: how does Aeneas respond to Turnus\\' final request? Why does he then do what he does? Should he? Think about:what Anchises said in the underworld,\\nWhat does Vergil think of his society, of its history, of its strengths and weaknesses?',\n", " 'Bryn Mawr Classical Review 2011.12.63\\nWilliam G. Thalmann, Apollonius of Rhodes and the Spaces of Hellenism. Classical Culture and Society. Oxford; New York: Oxford University Press, 2011. Pp. xix, 262. ISBN 9780199731572. $65.00.\\nReviewed by Félix Racine, University of St Andrews (firstname.lastname@example.org)\\nNow is a good time for the reading or rereading of ancient epics previously deemed derivative or artificial. The rehabilitation of Lucan and Statius was achieved a while ago, scholars no longer flee at the mention of Aratus and opinion is slowly warming to the late antique masterpieces of Prudentius and Nonnus.1 Apollonius\\' Argonautica, of course, has never gone unread, but a series of recent publications have brought to light the author’s originality and, tentatively, his engagement with the social reality of third-century BC Alexandria. William Thalmann’s Apollonius Rhodius and the Spaces of Hellenism carries forward this new reading of Apollonius by putting the emphasis back on the central theme of the Argonautica: the voyage of the ship Argo across the Black Sea and much of the Mediterranean. By analyzing the spatial aspect of the Argo’s journey, Thalmann outlines the ways Apollonius explores questions of Greek identity and relationships with foreign populations, and how the poem plays out socio-cultural issues pertinent to Hellenistic Alexandria.\\nThe justification for a spatial analysis of the Argonautica is laid out in the first two chapters, which respectively outline a theoretical framework for the study of spatiality in narratives, and trace the ways Apollonius defines and explores space through the Argo’s journey. Unlike past studies distinguishing between real and mythological aspects of the Argonautica\\'s geography, Thalmann judiciously starts from the observation that the poem does not in fact engage in geographical descriptions but rather constructs space through the travels and experiences of the Argonauts. By focusing on Apollonius’ poetic construction of space, Thalmann builds upon recent work on the Argonautica (notably by Santiago Rubio and Richard Hunter2) but seeks a different vantage point by anchoring his analysis on theories elaborated by social scientists and cultural geographers, most prominently Henri Lefebvre\\'s work on the spatial embodiment of social relations, Christopher Tilley\\'s theorization of the experience of space through movement, and Yi Fu Tuan\\'s idea of the transformation of spaces into places through narrative or experience. This cluster of theories has been formerly applied to geographical depictions in Greco-Roman historiography,3 but it is particularly promising in a poetic context, and forms a solid basis for Thalmann to explore Apollonius\\' construction of a Mediterranean space through the narrative of Greek journey.\\nHowever, this theoretical framework may distract from the intellectual context of Apollonius’ poem. A reader will emerge from Thalmann\\'s introductory chapter fully equipped with modern refinements on the concepts of place and space, but with little sense of their meaning for Apollonius and his readers. The author leaves unexamined terms for space and place such as chōra/chōros, used by Apollonius (e.g. 1.371, 2.929, 1117, 3.981, 1164) and subject to much scrutiny by philosophers from Plato to the contemporary Stoics and Epicurians. These terms may provide a much more immediate context for a spatial reading of the Argonautica, as there is evidence that it is precisely during the Hellenistic era that philosophers made the first serious effort to isolate and define space as a concept.4 If, as is likely, Thalmann is right to see Apollonius as a full participant in intellectual trends in third- century BC Alexandria (as he discusses in his conclusion), Apollonius\\' poetic preoccupation with space should be read within the context of these ancient thinkers\\' efforts to grapple with issues of space.\\nChapter 2 follows up these theoretical considerations by examining Apollonius’ creation of space through narrative. Offering a sophisticated analysis of the concept of pathways (poroi) in Greek culture and in the Argonautica, Thalmann outlines the Argonauts’ role in ordering space and linking places through the establishment of navigable pathways. A mandatory discussion of aitia in the Argonautica brings out the stratified time of the poem, which asks readers to think simultaneously about the journey of the Argonauts and about the present-day Mediterranean world, where signs of the Argo are still visible. The chapter ends on a programmatic statement that the Argonautica represents both a Greek appropriation of space justifying the domination of foreign peoples, and a questioning of Greek identity through the exploration of boundaries between categories.\\nThis Greek exploration and appropriation of space is explored in the following five chapters. Chapter 3 focuses on Greece as physical and symbolic center of the Argonauts’ voyage, as revealed by their movements through most of the first book of the Argonautica. Thalmann perceptively sees the Catalogue of Argonauts at 1.23-227 as (among other roles) defining Greece as a network of places represented by emblematic heroes gathering at Iolcus, but given the length and importance of the passage this spatial analysis is frustratingly short. Various scenes of departure from Iolcus are explored more fully and more fruitfully: the tension between home and journey emerges from Jason’s leave of the city; the communal choice of a leader, the soothing of quarrels and the cooperative establishment of an altar on the shore all point to future elements of the Greek polis; and the Argonauts’ encounter with the Lemnian women affirms but also questions Greek gender norms. Despite the extensive scrutiny these episodes have already received from other scholars, Thalmann’s spatial focus helps bring to the fore their exploration of Hellenic identity.\\nSpatial analysis comes into sharper focus in chapter 4, “Colonial spaces”, which examines the narrative construction of future sites of Greek colonization visited by the Argonauts, through the examples of Cyzicus (book 1), Heraclea Pontica (book 2) and Cyrene (book 4). In each of these locales, the actions of the Argonauts offer models of interaction between Greek colonists and native populations, and they are furthermore anchored in the landscape through a number of aitia transforming these locales into places of Greek memory (or, looking forward from the time of the Argonauts, places with a Greek future). Aitia at Cyzicus acknowledge in cult form the conflict between Greek newcomers and the local population, which puts into question the cultural superiority of the Greek aggressors. At Heraclea Pontica, a city with strong Ptolemaic ties, Apollonius evokes the cooperation between the Greek newcomers and the local Mariandynians; however, the Greeks end up absorbing and suppressing local customs. Finally, at Cyrene, the Argonauts’ passage through a formless desert decisively marks out the landscape as a Greek space.\\nChapter 5, “Contact”, examines Apollonius’ construction of Colchis as both a familiar and alien space. Thalmann finds the rationale for this ambiguous description of space in Herodotus’ assertion that the Colchidians hail from Egypt, unfortunately not explored further here, which enabled Apollonius to play out in the Caucasus problems of cultural contact faced by Greeks in Ptolemaic Egypt. The main evidence for this reading of Colchis as both barbarian and Greek are the many poetic connections established between Colchis and Greece in book 3 of the Argonautica, but also in the layout of Aietes’ palace, which combines exotic as well as familiar elements of Greek domestic spatial organization.\\nOne of Thalmann’s major concerns is to bring to light the means by which Apollonius imposes order on landscapes and to trace the limits of this spatial definition. This comes fully to the fore in chapter 6, “Rivers, Shores, Margins and Boundaries,” which explores the role of rivers in the narrative structure of the Argonautica. Attention is first paid to rivers successively sighted by the Argonauts in the Black Sea region in book 2. These rivers are not mere physical landmarks but are also imbued with cultural significance, being associated with episodes of the Argonauts’ journey or with characteristics of local populations, e.g. the disorderly Thermodon flows through the territory of the unruly Amazons. The point that culturally meaningful rivers frame the Argonautic landscape is well taken, but Thalmann’s suggestion that Apollonius’ Ptolemaic readers might have known these rivers from a common source (or first-hand-experience) is too optimistic. Turning to the Argo\\'s European journey in book 4 along the Istrus, the Eridanus and the Rhodanus, Thalmann takes the lack of form and features of these pathways as an acknowledgement of their location outside of a Greek system of space.\\nThe narrative of the Argonauts’ return from Colchis in book 4 of the Argonautica is notoriously convoluted and erratic. Thalmann argues in chapter 7, “The Roundabout Homecoming,” that this randomness is deceptive and hides a masterful blending of different traditions on the voyage of the Argo, crafted to create a picture of the Mediterranean seen from a traditional Greek viewpoint, while tracing the limits of the Greek mastery of space. The Argonauts’ sub-journey in the Adriatic (4.323-506, 982-1222) acquires here a special importance in Thalmann’s interpretation of the Argonautica as a text concerned with the experience of displaced Greeks in Ptolemaic Egypt, as it further develops models of colonial interaction between settlers and local populations already outlined in chapter 4.\\nHints to the importance of Apollonius\\' Alexandrian context are peppered throughout the book (e.g. pp. 9, 35, 51, 121, 167). This theme is belatedly addressed in the conclusion (chapter 8), which situates Apollonius within contemporary trends in Alexandrian poetics and attempts a reading of the Argonautica that takes into account the juxtaposition of cultural elements in Alexandria\\'s public spaces. In a move that complements Daniel Selden\\'s and Susan Stephens\\' studies of Greek poetic responses to Egyptian culture,5 Thalmann presents the Argonautica as an alternative response to the displacement of Greeks in Egypt, offering Greeks in Alexandria and elsewhere a common identifying myth by integrating culturally different places in a coherent Greek-centered narrative. Comparing and linking Apollonius\\', Callimachus\\', Theocritus\\' and Posidippus\\' engagement with spatial representation is a worthwhile achievement of this analysis and will be of interest to other scholars of Hellenistic poetry. More tentative and less conclusive is Thalmann\\'s evocation of the prominent place of Egyptian architecture and statuary in Alexandria as a context for the Argonautica\\'s juxtaposition of cultures.\\nApollonius of Rhodes and the Spaces of Hellenism is a welcome book. Thalmann presents new readings of the Argonautica and a valuable theoretical framework for the investigation of Apollonius\\' work, which might also be applied to other spatial epics. Well-written and evocative, it should help readers unfamiliar with modern theorizations of space to approach them through a well-known but still underestimated text.\\n1. The final frontier in ancient epic may now be late antique paraphrases of the Bible in hexameters and the little-read Latin translations of Dionysius Periegetes and other Greek didactic epic.\\n2. S. Rubio, Geography and the Representation of Space in the Argonautica of Apollonius Rhodius, diss. UC San Diego (1992); R. Hunter, “The Divine and Human Map of the Argonautica,” SyllClass 6 (1995), 13- 27.\\n3. E.g. K. Clarke, Between Geography and History: Hellenistic Constructions of the Roman World (Oxford, 1999).\\n4. See e.g. the evidence laid out in K. Algra, Concepts of Space in Greek Thought (Leiden, 1994), 37-38.\\n5. D. Selden, \"Alibis,\" ClAnt 17 (1998): 289-412; S. Stephens, Seeing Double: Intercultural Poetics in Ptolemaic Alexandria (Berkeley, 2003).'],\n", " 'question_categories': [{'categorization_name': 'question-formulation',\n", " 'category_name': 'concise and natural'},\n", " {'categorization_name': 'premise-categorization',\n", " 'category_name': 'without premise'},\n", " {'categorization_name': 'answer-type', 'category_name': 'comparison'}],\n", " 'user_categories': [{'categorization_name': 'user-expertise',\n", " 'category_name': 'novice'}],\n", " 'document_ids': ['',\n", " ''],\n", " 'error': None}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qa_pairs[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each generated result includes: \n", "\n", "- The generated **question** \n", "- The generated **answer** \n", "- The **context** (FineWeb documents) the question is based on \n", "- The **IDs** of those documents \n", "- The **question categories** used during generation \n", "- The **user categories** used during generation " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How to find information about past requests\n", "\n", "You can retrieve all information about your requests (such as request id, status, configuration, etc.) using the `get_all_requests` endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_all_requests():\n", " resp = requests.get(\n", " f\"{BASE_URL}get_all_requests\",\n", " headers={\"Authorization\": f\"Bearer {API_KEY}\"},\n", " )\n", " resp.raise_for_status()\n", " return resp.json()\n", "\n", "def print_request_summary(requests):\n", " if 'data' not in requests:\n", " print('There are no requests')\n", " for request in requests['data']:\n", " print(f\"{request['request_id']} : {request['status']}\")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0fe377ae-2dd0-41ae-b3c3-680caa4b17f5 : completed\n", "114a0e5d-3598-4e35-9105-0791fb542ef1 : completed\n", "1e15a423-e38c-4f04-aa41-1768d55aa5f8 : completed\n", "3d2f2208-acd1-4b38-bda7-e452532eef55 : completed\n", "5d27a4f3-4031-4952-9a86-937e767ad095 : completed\n", "c31818e9-795d-40ef-b0fc-b9b017ba0f80 : failed\n", "c43e53e0-8baf-4a49-8eb0-1cebda7245e8 : completed\n", "dbca2e71-d61d-4977-b0f3-ed6902ebfebf : completed\n", "ed90803a-d3fc-4a16-94f7-51bdbc8fb8a2 : completed\n", "ef2ddf55-0f4b-4604-8c74-5ca324362231 : completed\n", "ef358e16-a95e-43aa-a491-fa9a63c873e0 : completed\n", "f313a110-c596-4cda-a990-4a487aa3da2d : completed\n", "f561077a-3bce-42d0-b143-0054eb0a5fd4 : completed\n", "f747bac0-111d-4754-af22-58f99318a959 : completed\n", "fc9a52b6-b7f2-4d3e-a9ef-441c603beb3f : completed\n" ] } ], "source": [ "requests = get_all_requests()\n", "print_request_summary(requests)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" } }, "nbformat": 4, "nbformat_minor": 2 }