mgbam commited on
Commit
17602ca
·
verified ·
1 Parent(s): 3539a49

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +163 -96
README.md CHANGED
@@ -11,51 +11,72 @@ pinned: false
11
  short_description: Streamlit template space
12
  ---
13
 
14
- # 🎬 CineGen AI: Cinematic Video Generator 🚀
15
-
16
- CineGen AI is your AI-powered pocket film studio, designed to transform your textual ideas into compelling cinematic concepts, storyboards, and preliminary animatics. Leveraging the multimodal understanding and generation capabilities of Google's Gemini, CineGen AI assists creators in rapidly visualizing narratives.
17
-
18
- **Current Stage:** Alpha/Prototyping (Uses placeholder visuals, focuses on script and visual prompt generation with interactive editing)
19
-
20
- ## ✨ Key Features
21
-
22
- * **AI Story Generation:** Input a core idea, genre, and mood, and let Gemini craft a multi-scene story breakdown.
23
- * **Intelligent Scene Detailing:** Each scene includes:
24
- * Setting descriptions
25
- * Characters involved
26
- * Key actions & dialogue snippets
27
- * Visual style and camera angle suggestions
28
- * Emotional beats
29
- * **Visual Concept Generation:** For each scene, CineGen AI (via Gemini) generates detailed prompts suitable for advanced AI image generators. (Currently visualizes these as placeholder images with text).
30
- * **Interactive Storyboarding:**
31
- * **Script Regeneration:** Modify scene details (action, dialogue, mood) and have Gemini rewrite that specific part of the script.
32
- * **Visual Regeneration:** Provide feedback on visual concepts, and Gemini will refine the image generation prompt.
33
- * **Conceptual Advanced Controls:**
34
- * **Character Consistency (Foundation):** Define character descriptions to guide visual generation (prompt-based).
35
- * **Style Transfer (Textual Foundation):** Apply textual descriptions of artistic styles to influence visuals.
36
- * **Camera Angle Selection:** Basic camera angle choices to guide prompt generation.
37
- * **Animatic Video Creation:** Stitches generated (placeholder) images into a simple video sequence with `moviepy`.
38
- * **Modular Architecture:** Built with Python, Streamlit, and a clear separation of concerns for easier expansion.
39
- * **Dockerized:** Ready for deployment on platforms like Hugging Face Spaces.
40
-
41
- ## 🛠️ Tech Stack
42
-
43
- * **Core Logic:** Python
44
- * **LLM Backend:** Google Gemini API (via `google-generativeai`)
45
- * **UI Framework:** Streamlit
46
- * **Image Handling:** Pillow
47
- * **Video Assembly:** MoviePy
48
- * **Containerization:** Docker (for Hugging Face Spaces / portability)
49
-
50
- ## ⚙️ Setup & Installation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  1. **Clone the Repository:**
53
  ```bash
54
- git clone <your-repo-url>
55
- cd cinegen-ai
56
  ```
57
 
58
- 2. **Create Python Virtual Environment (Recommended):**
59
  ```bash
60
  python -m venv venv
61
  source venv/bin/activate # On Windows: venv\Scripts\activate
@@ -65,70 +86,116 @@ CineGen AI is your AI-powered pocket film studio, designed to transform your tex
65
  ```bash
66
  pip install -r requirements.txt
67
  ```
 
68
 
69
- 4. **Set Up API Key:**
70
- * Create a `.streamlit/secrets.toml` file in the root of the project.
71
- * Add your Google Gemini API key:
 
 
 
 
 
 
 
 
 
72
  ```toml
73
- GEMINI_API_KEY = "YOUR_ACTUAL_GEMINI_API_KEY"
 
 
 
 
 
74
  ```
75
- * **IMPORTANT:** Do NOT commit your `secrets.toml` file if your repository is public. Add `.streamlit/secrets.toml` to your `.gitignore` file.
76
 
77
- 5. **Font Requirement (for Placeholder Images):**
78
- * The `visual_engine.py` currently tries to use "arial.ttf".
79
- * Ensure this font is available on your system, or modify the `ImageFont.truetype("arial.ttf", 24)` line in `core/visual_engine.py` to point to a valid `.ttf` font file on your system, or let it fall back to `ImageFont.load_default()`.
80
- * On Debian/Ubuntu, you can install common Microsoft fonts: `sudo apt-get update && sudo apt-get install ttf-mscorefonts-installer`
 
 
 
81
 
82
- 6. **Run the Streamlit App:**
83
- ```bash
84
- streamlit run app.py
85
- ```
86
 
87
- ## 🐳 Docker & Hugging Face Spaces Deployment
 
 
 
 
88
 
89
- 1. Ensure Docker is installed and running.
90
- 2. Build the Docker image (optional, for local testing):
91
- ```bash
92
- docker build -t cinegen-ai .
93
- ```
94
- 3. Run the Docker container (optional, for local testing):
95
  ```bash
96
- docker run -p 8501:8501 -e GEMINI_API_KEY="YOUR_ACTUAL_GEMINI_API_KEY" cinegen-ai
97
  ```
98
- (Note: Passing API key as env var for local Docker run. For HF Spaces, use their secrets management).
99
- 4. **For Hugging Face Spaces:**
100
- * Push your code (including the `Dockerfile`) to a GitHub repository.
101
- * Create a new Space on Hugging Face, selecting "Docker" as the SDK.
102
- * Link it to your GitHub repository.
103
- * In the Space settings, add `GEMINI_API_KEY` as a secret.
104
- * The Space will build and deploy your application.
105
-
106
- ## 🚀 Usage
107
-
108
- 1. Open the app in your browser (usually `http://localhost:8501`).
109
- 2. Use the sidebar to input your story idea, genre, mood, and number of scenes.
110
- 3. Click "Generate Full Story Concept."
111
- 4. Review the generated scenes and visual concepts.
112
- 5. Use the "Edit Scene Script" and "Edit Scene Visuals" popovers within each scene to interactively refine content.
113
- 6. (Optional) Define characters or styles in the sidebar for more guided generation.
114
- 7. Once satisfied, click "Assemble Animatic Video."
115
-
116
- ## 🔮 Future Enhancements (Roadmap to "Wow")
117
-
118
- * **True AI Image Generation:** Integrate state-of-the-art text-to-image models (e.g., Stable Diffusion, DALL-E 3, Midjourney API) to replace placeholders.
119
- * **Advanced Character Consistency:** Implement techniques like LoRAs, textual inversion, or re-identification models for visually consistent characters across scenes.
120
- * **Image-Based Style Transfer:** Allow users to upload reference images to define artistic styles.
121
- * **AI Sound Design:** Generate or suggest sound effects and background music.
122
- * **Direct Video Snippets:** Integrate text-to-video models for dynamic short clips.
123
- * **Enhanced Camera Controls & Shot Design:** More granular control over virtual cinematography.
124
- * **User Accounts & Project Management.**
125
- * **Export Options:** PDF storyboards, FDX/script formats.
126
-
127
- ## 📄 License
128
-
129
- Consider a license like MIT or Apache 2.0 if you plan for open collaboration or wish to be permissive. If this is a commercial product, consult with legal counsel for appropriate licensing.
130
- For now, let's assume:
131
- **MIT License** (Add the full MIT License text if you choose this)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
 
133
  ---
134
- Copyright (c) [Year] [Your Name/Company Name]
 
 
11
  short_description: Streamlit template space
12
  ---
13
 
14
+ # CineGen AI Ultra+ 🎬✨
15
+
16
+ **Visionary Cinematic Pre-Production Powered by AI**
17
+
18
+ CineGen AI Ultra+ is a Streamlit web application designed to accelerate the creative pre-production process for films, animations, and other visual storytelling projects. It leverages the power of Large Language Models (Google's Gemini) and other AI tools to transform a simple story idea into a rich cinematic treatment, complete with scene breakdowns, visual style suggestions, AI-generated concept art/video clips, and a narrated animatic.
19
+
20
+ ## Features
21
+
22
+ * **AI Creative Director:** Input a core story idea, genre, and mood.
23
+ * **Cinematic Treatment Generation:**
24
+ * Gemini generates a detailed multi-scene treatment.
25
+ * Each scene includes:
26
+ * Title, Emotional Beat, Setting Description
27
+ * Characters Involved, Character Focus Moment
28
+ * Key Plot Beat, Suggested Dialogue Hook
29
+ * **Proactive Director's Suggestions (감독 - Gamdok/Director):** Visual Style, Camera Work, Sound Design.
30
+ * **Asset Generation Aids:** Suggested Asset Type (Image/Video Clip), Video Motion Description & Duration, Image/Video Keywords, Pexels Search Queries.
31
+ * **Visual Asset Generation:**
32
+ * **Image Generation:** Utilizes DALL-E 3 (via OpenAI API) to generate concept art for scenes based on derived prompts.
33
+ * **Stock Footage Fallback:** Uses Pexels API for relevant stock images/videos if AI generation is disabled or fails.
34
+ * **Video Clip Generation (Placeholder):** Integrated structure for text-to-video/image-to-video generation using RunwayML API (requires user to implement actual API calls in `core/visual_engine.py`). Placeholder generates dummy video clips.
35
+ * **Character Definition:** Define key characters with visual descriptions for more consistent AI-generated visuals.
36
+ * **Global Style Overrides:** Apply global stylistic keywords (e.g., "Hyper-Realistic Gritty Noir," "Vintage Analog Sci-Fi") to influence visual generation.
37
+ * **AI-Powered Narration:**
38
+ * Gemini crafts a narration script based on the generated treatment.
39
+ * ElevenLabs API synthesizes the narration into natural-sounding audio.
40
+ * Customizable voice ID and narration style.
41
+ * **Iterative Refinement:**
42
+ * Edit scene treatments and regenerate them with AI assistance.
43
+ * Refine DALL-E prompts based on feedback and regenerate visuals.
44
+ * **Cinematic Animatic Assembly:**
45
+ * Combines generated visual assets (images/video clips) and the synthesized narration into a downloadable `.mp4` animatic.
46
+ * Customizable per-scene duration for pacing control.
47
+ * Ken Burns effect on still images and text overlays for scene context.
48
+ * **Secrets Management:** Securely loads API keys from Streamlit secrets or environment variables.
49
+
50
+ ## Project Structure
51
+ Use code with caution.
52
+ Markdown
53
+ CineGenAI/
54
+ ├── .streamlit/
55
+ │ └── secrets.toml # API Keys and configuration (DO NOT COMMIT if public)
56
+ ├── assets/
57
+ │ └── fonts/
58
+ │ └── arial.ttf # Example font file (ensure it's available or update path)
59
+ ├── core/
60
+ │ ├── init.py
61
+ │ ├── gemini_handler.py # Manages interactions with the Gemini API
62
+ │ ├── visual_engine.py # Handles image/video generation (DALL-E, Pexels, RunwayML placeholder) and video assembly
63
+ │ └── prompt_engineering.py # Contains functions to craft detailed prompts for Gemini
64
+ ├── temp_cinegen_media/ # Temporary directory for generated media (add to .gitignore)
65
+ ├── app.py # Main Streamlit application script
66
+ ├── Dockerfile # For containerizing the application
67
+ ├── Dockerfile.test # (Optional) For testing
68
+ ├── requirements.txt # Python dependencies
69
+ ├── README.md # This file
70
+ └── .gitattributes # For Git LFS if handling large font files
71
+ ## Setup and Installation
72
 
73
  1. **Clone the Repository:**
74
  ```bash
75
+ git clone <repository_url>
76
+ cd CineGenAI
77
  ```
78
 
79
+ 2. **Create a Virtual Environment (Recommended):**
80
  ```bash
81
  python -m venv venv
82
  source venv/bin/activate # On Windows: venv\Scripts\activate
 
86
  ```bash
87
  pip install -r requirements.txt
88
  ```
89
+ *Note: `MoviePy` might require `ffmpeg` to be installed on your system. On Debian/Ubuntu: `sudo apt-get install ffmpeg`. On macOS with Homebrew: `brew install ffmpeg`.*
90
 
91
+ 4. **Set Up API Keys:**
92
+ You need API keys for the following services:
93
+ * Google Gemini API
94
+ * OpenAI API (for DALL-E)
95
+ * ElevenLabs API (and optionally a specific Voice ID)
96
+ * Pexels API
97
+ * RunwayML API (if implementing full video generation)
98
+
99
+ Store these keys securely. You have two primary options:
100
+
101
+ * **Streamlit Secrets (Recommended for Hugging Face Spaces / Streamlit Cloud):**
102
+ Create a file `.streamlit/secrets.toml` (make sure this file is in your `.gitignore` if your repository is public!) with the following format:
103
  ```toml
104
+ GEMINI_API_KEY = "your_gemini_api_key"
105
+ OPENAI_API_KEY = "your_openai_api_key"
106
+ ELEVENLABS_API_KEY = "your_elevenlabs_api_key"
107
+ PEXELS_API_KEY = "your_pexels_api_key"
108
+ ELEVENLABS_VOICE_ID = "your_elevenlabs_voice_id" # e.g., "Rachel" or a custom ID
109
+ RUNWAY_API_KEY = "your_runwayml_api_key"
110
  ```
 
111
 
112
+ * **Environment Variables (for local development):**
113
+ Set the environment variables directly in your terminal or `.env` file (using a library like `python-dotenv` which is not included by default). The application will look for these if Streamlit secrets are not found.
114
+ ```bash
115
+ export GEMINI_API_KEY="your_gemini_api_key"
116
+ export OPENAI_API_KEY="your_openai_api_key"
117
+ # ... and so on for other keys
118
+ ```
119
 
120
+ 5. **Font:**
121
+ Ensure the font file specified in `core/visual_engine.py` (e.g., `arial.ttf`) is accessible. The script tries common system paths, but you can place it in `assets/fonts/` and adjust the path in `visual_engine.py` if needed. If using Docker, ensure the font is copied into the image (see `Dockerfile`).
 
 
122
 
123
+ 6. **RunwayML Implementation (Important):**
124
+ The current integration for RunwayML in `core/visual_engine.py` (method `_generate_video_clip_with_runwayml`) is a **placeholder**. You will need to:
125
+ * Install the official RunwayML SDK if available.
126
+ * Implement the actual API calls to RunwayML for text-to-video or image-to-video generation.
127
+ * The placeholder currently creates a dummy video clip using MoviePy.
128
 
129
+ ## Running the Application
130
+
131
+ 1. **Local Development:**
 
 
 
132
  ```bash
133
+ streamlit run app.py
134
  ```
135
+ The application should open in your web browser.
136
+
137
+ 2. **Using Docker (Optional):**
138
+ * Build the Docker image:
139
+ ```bash
140
+ docker build -t cinegen-ai .
141
+ ```
142
+ * Run the Docker container (ensure API keys are passed as environment variables or handled via mounted secrets if not baked into the image for local testing):
143
+ ```bash
144
+ docker run -p 8501:8501 \
145
+ -e GEMINI_API_KEY="your_key" \
146
+ -e OPENAI_API_KEY="your_key" \
147
+ # ... other env vars ...
148
+ cinegen-ai
149
+ ```
150
+ Access the app at `http://localhost:8501`.
151
+
152
+ ## How to Use
153
+
154
+ 1. **Input Creative Seed:** Provide your core story idea, select a genre, mood, number of scenes, and AI director style in the sidebar.
155
+ 2. **Generate Treatment:** Click "🌌 Generate Cinematic Treatment". The AI will produce a multi-scene breakdown.
156
+ 3. **Review & Refine:**
157
+ * Examine each scene's details, including AI-generated visuals (or placeholders).
158
+ * Use the "✏️ Edit Scene X Treatment" and "🎨 Edit Scene X Visual Prompt" popovers to provide feedback and regenerate specific parts of the treatment or visuals.
159
+ * Adjust per-scene "Dominant Shot Type" and "Scene Duration" for the animatic.
160
+ 4. **Fine-Tuning (Sidebar):**
161
+ * Define characters with visual descriptions.
162
+ * Apply global style overrides.
163
+ * Set narrator voice ID and narration style.
164
+ 5. **Assemble Animatic:** Once you're satisfied with the treatment, visuals, and narration script (generated automatically), click "🎬 Assemble Narrated Cinematic Animatic".
165
+ 6. **View & Download:** The generated animatic video will appear, and you can download it.
166
+
167
+ ## Key Technologies
168
+
169
+ * **Python**
170
+ * **Streamlit:** Web application framework.
171
+ * **Google Gemini API:** For core text generation (treatment, narration script, prompt refinement).
172
+ * **OpenAI API (DALL-E 3):** For AI image generation.
173
+ * **ElevenLabs API:** For text-to-speech narration.
174
+ * **Pexels API:** For stock image/video fallbacks.
175
+ * **RunwayML API (Placeholder):** For AI video clip generation.
176
+ * **MoviePy:** For video processing and animatic assembly.
177
+ * **Pillow (PIL):** For image manipulation.
178
+
179
+ ## Future Enhancements / To-Do
180
+
181
+ * Implement full, robust RunwayML API integration.
182
+ * Option to upload custom seed images for image-to-video generation.
183
+ * More sophisticated control over Ken Burns effect (pan direction, zoom intensity).
184
+ * Allow users to upload their own audio for narration or background music.
185
+ * Advanced shot list generation and export.
186
+ * Integration with other AI video/image models.
187
+ * User accounts and project saving.
188
+ * More granular error handling and user feedback in the UI.
189
+ * Refine JSON cleaning from Gemini to be even more robust.
190
+
191
+ ## Contributing
192
+
193
+ Contributions, bug reports, and feature requests are welcome! Please open an issue or submit a pull request.
194
+
195
+ ## License
196
+
197
+ This project is currently under [Specify License Here - e.g., MIT, Apache 2.0, or state as private/proprietary].
198
 
199
  ---
200
+
201
+ *This README provides a comprehensive overview. Ensure all paths, commands, and API key instructions match your specific project setup.*