File size: 2,127 Bytes
b6b7427
6c09f76
 
b6b7427
6c09f76
b6b7427
6c09f76
b6b7427
6c09f76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
title: ML6-Gemini-Demo
app_file: src/app.py
sdk: gradio
sdk_version: 5.23.0
---
# Gemini Voice Agent Demo

This repo contains a demo using the Gemini MultiModal API to create a voice-based agent that can conduct professional technical screening interviews.


## Technical Overview

The system is based on FastRTC and Gradio to provide a real-time voice UI.

### About the modality

You can configure the output modality:

- If set to AUDIO
    - The agent will respond with an audio response.
    - There is no text output so no transcription
if set to TEXT
    - The agent will respond with a text response.
    - The text output will be transcribed to audio using the TTS API.
    - Transcriptions are available.

### Function Calling

There are 2 functions that can be called:
- Answer validation
    - will check the answer type vs the expected type
    - will store the answer
- Log Input
    - will log the user input
    - this is a form of transcribing the incoming audio

## Getting Started

To run the application, follow these steps:

1. Install uv (if not already installed):
`curl -LsSf https://astral.sh/uv/install.sh | sh`

2. Install dependencies:
`uv sync`

3. Setup the environment variables for either GenAI or VertexAI (see below)

4. Run the application:
`python src/app.py`

5. Visit `http://127.0.0.1:7860` in your browser to interact with the voice agent.


### GenAI vs VertexAI

"gemini-2.0-flash-exp" can be used in both GenAI and VertexAI. [more info](https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide?tab=readme-ov-file)

- GenAI requires just a GEMINI_API_KEY environment variable [link](https://ai.google.dev/gemini-api/docs/api-key)
- VertexAI requires a GCP project and the following environment variables:
```
export GOOGLE_CLOUD_PROJECT=YOUR_PROJECT_ID
export GOOGLE_CLOUD_LOCATION=europe-west4
export GOOGLE_GENAI_USE_VERTEXAI=True
```

Depending `GOOGLE_GENAI_USE_VERTEXAI` flag this demo will use either GenAI or VertexAI.

### Note

The gradio-webrtc install fails unless you have ffmpeg@6, on mac:

```
brew uninstall ffmpeg
brew install ffmpeg@6
brew link ffmpeg@6
```