CingenAI / README.md
mgbam's picture
Update README.md
17602ca verified
metadata
title: CineeeeAi
emoji: πŸš€
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: Streamlit template space

CineGen AI Ultra+ 🎬✨

Visionary Cinematic Pre-Production Powered by AI

CineGen AI Ultra+ is a Streamlit web application designed to accelerate the creative pre-production process for films, animations, and other visual storytelling projects. It leverages the power of Large Language Models (Google's Gemini) and other AI tools to transform a simple story idea into a rich cinematic treatment, complete with scene breakdowns, visual style suggestions, AI-generated concept art/video clips, and a narrated animatic.

Features

  • AI Creative Director: Input a core story idea, genre, and mood.
  • Cinematic Treatment Generation:
    • Gemini generates a detailed multi-scene treatment.
    • Each scene includes:
      • Title, Emotional Beat, Setting Description
      • Characters Involved, Character Focus Moment
      • Key Plot Beat, Suggested Dialogue Hook
      • Proactive Director's Suggestions (감독 - Gamdok/Director): Visual Style, Camera Work, Sound Design.
      • Asset Generation Aids: Suggested Asset Type (Image/Video Clip), Video Motion Description & Duration, Image/Video Keywords, Pexels Search Queries.
  • Visual Asset Generation:
    • Image Generation: Utilizes DALL-E 3 (via OpenAI API) to generate concept art for scenes based on derived prompts.
    • Stock Footage Fallback: Uses Pexels API for relevant stock images/videos if AI generation is disabled or fails.
    • Video Clip Generation (Placeholder): Integrated structure for text-to-video/image-to-video generation using RunwayML API (requires user to implement actual API calls in core/visual_engine.py). Placeholder generates dummy video clips.
  • Character Definition: Define key characters with visual descriptions for more consistent AI-generated visuals.
  • Global Style Overrides: Apply global stylistic keywords (e.g., "Hyper-Realistic Gritty Noir," "Vintage Analog Sci-Fi") to influence visual generation.
  • AI-Powered Narration:
    • Gemini crafts a narration script based on the generated treatment.
    • ElevenLabs API synthesizes the narration into natural-sounding audio.
    • Customizable voice ID and narration style.
  • Iterative Refinement:
    • Edit scene treatments and regenerate them with AI assistance.
    • Refine DALL-E prompts based on feedback and regenerate visuals.
  • Cinematic Animatic Assembly:
    • Combines generated visual assets (images/video clips) and the synthesized narration into a downloadable .mp4 animatic.
    • Customizable per-scene duration for pacing control.
    • Ken Burns effect on still images and text overlays for scene context.
  • Secrets Management: Securely loads API keys from Streamlit secrets or environment variables.

Project Structure

Use code with caution. Markdown CineGenAI/ β”œβ”€β”€ .streamlit/ β”‚ └── secrets.toml # API Keys and configuration (DO NOT COMMIT if public) β”œβ”€β”€ assets/ β”‚ └── fonts/ β”‚ └── arial.ttf # Example font file (ensure it's available or update path) β”œβ”€β”€ core/ β”‚ β”œβ”€β”€ init.py β”‚ β”œβ”€β”€ gemini_handler.py # Manages interactions with the Gemini API β”‚ β”œβ”€β”€ visual_engine.py # Handles image/video generation (DALL-E, Pexels, RunwayML placeholder) and video assembly β”‚ └── prompt_engineering.py # Contains functions to craft detailed prompts for Gemini β”œβ”€β”€ temp_cinegen_media/ # Temporary directory for generated media (add to .gitignore) β”œβ”€β”€ app.py # Main Streamlit application script β”œβ”€β”€ Dockerfile # For containerizing the application β”œβ”€β”€ Dockerfile.test # (Optional) For testing β”œβ”€β”€ requirements.txt # Python dependencies β”œβ”€β”€ README.md # This file └── .gitattributes # For Git LFS if handling large font files

Setup and Installation

  1. Clone the Repository:

    git clone <repository_url>
    cd CineGenAI
    
  2. Create a Virtual Environment (Recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install Dependencies:

    pip install -r requirements.txt
    

    Note: MoviePy might require ffmpeg to be installed on your system. On Debian/Ubuntu: sudo apt-get install ffmpeg. On macOS with Homebrew: brew install ffmpeg.

  4. Set Up API Keys: You need API keys for the following services:

    • Google Gemini API
    • OpenAI API (for DALL-E)
    • ElevenLabs API (and optionally a specific Voice ID)
    • Pexels API
    • RunwayML API (if implementing full video generation)

    Store these keys securely. You have two primary options:

    • Streamlit Secrets (Recommended for Hugging Face Spaces / Streamlit Cloud): Create a file .streamlit/secrets.toml (make sure this file is in your .gitignore if your repository is public!) with the following format:

      GEMINI_API_KEY = "your_gemini_api_key"
      OPENAI_API_KEY = "your_openai_api_key"
      ELEVENLABS_API_KEY = "your_elevenlabs_api_key"
      PEXELS_API_KEY = "your_pexels_api_key"
      ELEVENLABS_VOICE_ID = "your_elevenlabs_voice_id" # e.g., "Rachel" or a custom ID
      RUNWAY_API_KEY = "your_runwayml_api_key"
      
    • Environment Variables (for local development): Set the environment variables directly in your terminal or .env file (using a library like python-dotenv which is not included by default). The application will look for these if Streamlit secrets are not found.

      export GEMINI_API_KEY="your_gemini_api_key"
      export OPENAI_API_KEY="your_openai_api_key"
      # ... and so on for other keys
      
  5. Font: Ensure the font file specified in core/visual_engine.py (e.g., arial.ttf) is accessible. The script tries common system paths, but you can place it in assets/fonts/ and adjust the path in visual_engine.py if needed. If using Docker, ensure the font is copied into the image (see Dockerfile).

  6. RunwayML Implementation (Important): The current integration for RunwayML in core/visual_engine.py (method _generate_video_clip_with_runwayml) is a placeholder. You will need to:

    • Install the official RunwayML SDK if available.
    • Implement the actual API calls to RunwayML for text-to-video or image-to-video generation.
    • The placeholder currently creates a dummy video clip using MoviePy.

Running the Application

  1. Local Development:

    streamlit run app.py
    

    The application should open in your web browser.

  2. Using Docker (Optional):

    • Build the Docker image:
      docker build -t cinegen-ai .
      
    • Run the Docker container (ensure API keys are passed as environment variables or handled via mounted secrets if not baked into the image for local testing):
      docker run -p 8501:8501 \
        -e GEMINI_API_KEY="your_key" \
        -e OPENAI_API_KEY="your_key" \
        # ... other env vars ...
        cinegen-ai
      
      Access the app at http://localhost:8501.

How to Use

  1. Input Creative Seed: Provide your core story idea, select a genre, mood, number of scenes, and AI director style in the sidebar.
  2. Generate Treatment: Click "🌌 Generate Cinematic Treatment". The AI will produce a multi-scene breakdown.
  3. Review & Refine:
    • Examine each scene's details, including AI-generated visuals (or placeholders).
    • Use the "✏️ Edit Scene X Treatment" and "🎨 Edit Scene X Visual Prompt" popovers to provide feedback and regenerate specific parts of the treatment or visuals.
    • Adjust per-scene "Dominant Shot Type" and "Scene Duration" for the animatic.
  4. Fine-Tuning (Sidebar):
    • Define characters with visual descriptions.
    • Apply global style overrides.
    • Set narrator voice ID and narration style.
  5. Assemble Animatic: Once you're satisfied with the treatment, visuals, and narration script (generated automatically), click "🎬 Assemble Narrated Cinematic Animatic".
  6. View & Download: The generated animatic video will appear, and you can download it.

Key Technologies

  • Python
  • Streamlit: Web application framework.
  • Google Gemini API: For core text generation (treatment, narration script, prompt refinement).
  • OpenAI API (DALL-E 3): For AI image generation.
  • ElevenLabs API: For text-to-speech narration.
  • Pexels API: For stock image/video fallbacks.
  • RunwayML API (Placeholder): For AI video clip generation.
  • MoviePy: For video processing and animatic assembly.
  • Pillow (PIL): For image manipulation.

Future Enhancements / To-Do

  • Implement full, robust RunwayML API integration.
  • Option to upload custom seed images for image-to-video generation.
  • More sophisticated control over Ken Burns effect (pan direction, zoom intensity).
  • Allow users to upload their own audio for narration or background music.
  • Advanced shot list generation and export.
  • Integration with other AI video/image models.
  • User accounts and project saving.
  • More granular error handling and user feedback in the UI.
  • Refine JSON cleaning from Gemini to be even more robust.

Contributing

Contributions, bug reports, and feature requests are welcome! Please open an issue or submit a pull request.

License

This project is currently under [Specify License Here - e.g., MIT, Apache 2.0, or state as private/proprietary].


This README provides a comprehensive overview. Ensure all paths, commands, and API key instructions match your specific project setup.