{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Welcome to the Second Lab - Week 1, Day 3\n",
"\n",
"Today we will work with lots of models! This is a way to get comfortable with APIs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Start with imports - ask ChatGPT to explain any package that you don't know\n",
"\n",
"import os\n",
"import json\n",
"from dotenv import load_dotenv\n",
"from openai import OpenAI, AsyncOpenAI\n",
"from IPython.display import Markdown, display\n",
"import asyncio\n",
"from functools import partial"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Always remember to do this!\n",
"load_dotenv(override=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Print the key prefixes to help with any debugging\n",
"\n",
"openai_api_key = os.getenv('OPENAI_API_KEY')\n",
"google_api_key = os.getenv('GOOGLE_API_KEY')\n",
"groq_api_key = os.getenv('GROQ_API_KEY')\n",
"\n",
"if openai_api_key:\n",
" print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n",
"else:\n",
" print(\"OpenAI API Key not set\")\n",
"\n",
"\n",
"if google_api_key:\n",
" print(f\"Google API Key exists and begins {google_api_key[:2]}\")\n",
"else:\n",
" print(\"Google API Key not set (and this is optional)\")\n",
"\n",
"if groq_api_key:\n",
" print(f\"Groq API Key exists and begins {groq_api_key[:4]}\")\n",
"else:\n",
" print(\"Groq API Key not set (and this is optional)\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"request = \"Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. \"\n",
"request += \"Answer only with the question, no explanation.\"\n",
"messages = [{\"role\": \"user\", \"content\": request}]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"openai = AsyncOpenAI()\n",
"response = await openai.chat.completions.create(\n",
" model=\"gpt-4o-mini\",\n",
" messages=messages,\n",
")\n",
"question = response.choices[0].message.content\n",
"print(question)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"messages = [{\"role\": \"user\", \"content\": question}]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dataclasses import dataclass\n",
"\n",
"@dataclass\n",
"class LLMResource:\n",
" api_key: str\n",
" model: str\n",
" url: str = None # optional otherwise NOone\n",
"\n",
"llm_resources = [\n",
" LLMResource(api_key=openai_api_key, model=\"gpt-4o-mini\"),\n",
" LLMResource(api_key=google_api_key, model=\"gemini-2.5-flash\", url=\"https://generativelanguage.googleapis.com/v1beta/openai/\"),\n",
" LLMResource(api_key=groq_api_key, model=\"qwen/qwen3-32b\", url=\"https://api.groq.com/openai/v1\"),\n",
" LLMResource(api_key=\"ollama\", model=\"deepseek-r1:1.5b\", url=\"http://localhost:11434/v1\" )\n",
"]\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"async def llm_call(key, model_name, url, messages) -> tuple:\n",
" if url is None:\n",
" llm = AsyncOpenAI(api_key=key)\n",
" else: \n",
" llm = AsyncOpenAI(base_url=url,api_key=key)\n",
" \n",
" response = await llm.chat.completions.create(\n",
" model=model_name, messages=messages)\n",
" \n",
" answer = (model_name, response.choices[0].message.content)\n",
"\n",
" return answer #returns tuple of modle and response from LLM\n",
"\n",
"llm_callable = partial(llm_call, messages=messages) #prefill with messages\n",
"# Always remember to do this!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#gather all responses concurrently\n",
"tasks = [llm_callable(res.api_key,res.model,res.url) for res in llm_resources]\n",
"results = await asyncio.gather(*tasks)\n",
"together = [f'Response from competitor {model}:{answer}' for model,answer in results]#gather results once all model finish running\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"judge = f\"\"\"You are judging a competition between {len(llm_resources)} competitors.\n",
"Each model has been given this question:\n",
"\n",
"{request}\n",
"\n",
"Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.\n",
"Respond with JSON, and only JSON, with the following format:\n",
"{{\"results\": [\"best competitor number\", \"second best competitor number\", \"third best competitor number\", ...]}}\n",
"\n",
"Here are the responses from each competitor:\n",
"\n",
"{together} # all responses\n",
"\n",
"Now respond with the JSON with the ranked order of the competitors name, nothing else. Do not include markdown formatting or code blocks.\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(judge)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"judge_messages = [{\"role\": \"user\", \"content\": judge}]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Judgement time!\n",
"\n",
"openai = OpenAI()\n",
"response = openai.chat.completions.create(\n",
" model=\"o3-mini\",\n",
" messages=judge_messages,\n",
")\n",
"results = response.choices[0].message.content\n",
"print(results)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# OK let's turn this into results!\n",
"\n",
"results_dict = json.loads(results)\n",
"\n",
"ranks = results_dict[\"results\"]\n",
"\n",
"for index, result in enumerate(ranks):\n",
" print(f\"Rank {index+1}: {result}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
" \n",
" \n",
" \n",
" | \n",
" \n",
" Exercise\n",
" Which pattern(s) did this use? Try updating this to add another Agentic design pattern.\n",
" \n",
" | \n",
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
" \n",
" \n",
" \n",
" | \n",
" \n",
" Commercial implications\n",
" These kinds of patterns - to send a task to multiple models, and evaluate results,\n",
" are common where you need to improve the quality of your LLM response. This approach can be universally applied\n",
" to business projects where accuracy is critical.\n",
" \n",
" | \n",
"
\n",
"
"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}