Spaces:

mgbam
/

CingenAI

Running

File size: 21,685 Bytes

# core/prompt_engineering.py
import json

def create_cinematic_treatment_prompt(user_idea, genre, mood, num_scenes=3, creative_guidance="standard"):
    """
    Generates a prompt for Gemini to create a full cinematic treatment, including
    proactive suggestions for visual style, camera, sound, thematic elements,
    and whether a scene is better as an image or a short video clip.
    creative_guidance: "standard", "more_artistic", "experimental_narrative"
    """
    guidance_detail = {
        "standard": "Provide solid, genre-appropriate suggestions. Recommend 'image' for most asset types unless strong motion is implied.",
        "more_artistic": "Lean into more artistic, unconventional, and visually striking suggestions for style and camera. Suggest unique color palettes or lighting. Consider where a 'video_clip' might offer more impact.",
        "experimental_narrative": "Feel free to suggest a minor unexpected narrative twist or a symbolic visual motif. If a scene has significant implied motion or transformation, recommend 'video_clip'."
    }[creative_guidance]

    # Updated list of fields to request from Gemini
    return f"""
    You are an AI Creative Director and Master Storyteller, collaborating on a cinematic concept.
    Base Idea: "{user_idea}"
    Genre: "{genre}"
    Mood: "{mood}"
    Number of Key Scenes: {num_scenes}
    Creative Guidance Level: {creative_guidance} ({guidance_detail})

    Task: Develop a rich cinematic treatment. For EACH of the {num_scenes} key scenes, provide the following 16 fields EXACTLY as named:
    1.  `scene_number` (int): Sequential.
    2.  `scene_title` (str): A short, evocative title (e.g., "The Neon Rains of Sector 7").
    3.  `emotional_beat` (str): The core emotion or feeling this scene should evoke (e.g., "Tension and Suspense", "Hopeful Discovery", "Tragic Realization").
    4.  `setting_description` (str): Vivid, sensory details (sight, sound, atmosphere). Where are we? What makes it unique? (40-60 words).
    5.  `characters_involved` (list of str): Names of characters central to this scene. If non-speaking or an entity (e.g., "Scavenger Drone"), list them.
    6.  `character_focus_moment` (str): For primary character(s), describe a key internal thought, expression, or micro-action revealing their state or arc. If no specific character focus, describe the general atmosphere's impact.
    7.  `key_plot_beat` (str): Critical plot development or character action (1-2 sentences). Suitable for brief video overlay.
    8.  `suggested_dialogue_hook` (str): One potent line of dialogue. (If no dialogue, state "Silent scene" or describe key non-verbal communication).
    9.  `PROACTIVE_visual_style_감독` (str): Your detailed suggestion for this scene's visual style. Specific art movements, film references, color theory, lighting (e.g., "Dutch angles, chiaroscuro, desaturated palette with cyan highlights, Tarkovsky-esque cyberpunk").
    10. `PROACTIVE_camera_work_감독` (str): Your suggestion for impactful camera work. Describe a specific shot or short sequence (e.g., "Slow dolly zoom into protagonist's eyes, whip pan to reveal threat").
    11. `PROACTIVE_sound_design_감독` (str): Key ambient sounds, SFX, and musical mood/instrumentation (e.g., "Ambient: City hum, dripping water. SFX: Glitching spark. Music: Ominous synth pads, detuned piano motif").
    12. `suggested_asset_type_감독` (str): Your recommendation for the primary visual asset for this scene: "image" (for a still) or "video_clip" (for a short ~3-7 second generated video). Default to "image" unless strong motion is described or implied.
    13. `video_clip_motion_description_감독` (str): If `suggested_asset_type_감독` is "video_clip", describe the primary motion in the scene (e.g., "Protagonist slowly turns to face the camera", "Raindrops striking neon puddles, camera pans up", "Spaceship flies past from left to right"). Otherwise, "N/A".
    14. `video_clip_duration_estimate_secs_감독` (int): If `suggested_asset_type_감독` is "video_clip", provide an estimated duration in seconds (e.g., 3, 5, 7). Otherwise, 0.
    15. `image_generation_keywords_감독` (str): A concise list of 5-8 powerful keywords extracted from all above details (setting, characters, action, style, camera, motion if video), for generating a visual asset (image OR video). Focus on nouns, strong adjectives, artistic styles, and key motion verbs if applicable. (e.g., "cyberpunk alleyway, neon rain, lone figure Jax, glowing data streams, high contrast shadows, cinematic low-angle, slow pan").
    16. `pexels_search_query_감독` (str): A concise, effective search query (2-4 words) for Pexels for a background or atmospheric shot (e.g., "rainy neon city," "vast desert landscape," "dark server room").

    If `creative_guidance` is "experimental_narrative", for ONLY ONE scene, you may subtly alter `key_plot_beat` or add a symbolic element to `setting_description` for an unexpected twist. If so, add `director_note` (str) to THAT SCENE ONLY, explaining the choice.

    Output ONLY a valid JSON list of these scene objects. Ensure all field names are exactly as specified (감독 denotes your proactive directorial input).
    Example for one scene object (ensure all 16 fields are present for every scene, plus optional director_note):
    {{
        "scene_number": 1,
        "scene_title": "Sun-Bleached Mirage",
        "emotional_beat": "Desperate Survival",
        "setting_description": "Relentless sun on endless rust-colored dunes, shimmering with heat haze. Skeletal remains of colossal, forgotten machinery litter the landscape. Air heavy with dust and decay.",
        "characters_involved": ["Anya"],
        "character_focus_moment": "Anya's sunburnt hand reaches for her nearly empty water canteen. Doubt flickers, then masked by determination.",
        "key_plot_beat": "Anya navigates treacherous dunes, guided by a tattered map, avoiding mechanical scavengers.",
        "suggested_dialogue_hook": "(Anya, raspy whisper) 'Almost... just a little further.'",
        "PROACTIVE_visual_style_감독": "Widescreen anamorphic, sun-bleached desaturated palette, metallic glints. HDR, harsh sun, deep shadows. 'Mad Max: Fury Road' meets 'Dune (2021)'. Visible heat distortion.",
        "PROACTIVE_camera_work_감독": "Extreme long shot of Anya in vast desert, then gritty close-up on her face. Slow, deliberate tracking.",
        "PROACTIVE_sound_design_감독": "Ambient: Howling wind, distant creaks. SFX: Sand crunch, strained breathing. Music: Sparse, atmospheric synth drones, percussive hits.",
        "suggested_asset_type_감독": "image",
        "video_clip_motion_description_감독": "N/A",
        "video_clip_duration_estimate_secs_감독": 0,
        "image_generation_keywords_감독": "post-apocalyptic desert, lone wanderer Anya, rust dunes, colossal wrecks, heat haze, cinematic widescreen, sun-bleached, determined expression",
        "pexels_search_query_감독": "vast desert sun"
    }}
    """

def construct_dalle_prompt(scene_data, character_definitions=None, global_style_additions=""):
    """
    Constructs the final DALL-E prompt for an IMAGE, using keywords from Gemini's treatment,
    injecting character details, and global style preferences.
    """
    scene_title = scene_data.get('scene_title', 'A dramatic moment')
    # Use the more generic keyword field, suitable for images
    base_keywords = scene_data.get('image_generation_keywords_감독', 'cinematic scene, highly detailed')
    setting_desc_context = scene_data.get('setting_description', '')
    action_desc_context = scene_data.get('key_plot_beat', '')
    director_visual_style = scene_data.get('PROACTIVE_visual_style_감독', '')
    director_camera = scene_data.get('PROACTIVE_camera_work_감독', '')
    emotional_beat_context = scene_data.get('emotional_beat', scene_title)

    current_scene_character_details = []
    characters_involved_in_scene = scene_data.get('characters_involved', [])
    if characters_involved_in_scene:
        for char_name_from_scene in characters_involved_in_scene:
            char_name_clean = char_name_from_scene.strip()
            char_lookup_key = char_name_clean.lower()
            if character_definitions and char_lookup_key in character_definitions:
                char_visual_desc = character_definitions[char_lookup_key]
                current_scene_character_details.append(f"{char_name_clean} (depicted as: {char_visual_desc})")
            else:
                current_scene_character_details.append(char_name_clean)

    character_narrative = ""
    if current_scene_character_details:
        if len(current_scene_character_details) == 1:
            character_narrative = f"The scene focuses on {current_scene_character_details[0]}."
        else:
            character_narrative = f"The scene prominently features {', '.join(current_scene_character_details[:-1])} and {current_scene_character_details[-1]}."

    final_style_directive = director_visual_style
    if global_style_additions:
        final_style_directive = f"{director_visual_style}. Additional global style notes: {global_style_additions}."

    prompt = (
        f"Create an ultra-detailed, photorealistic, and intensely cinematic still image (digital painting or high-fidelity concept art). "
        f"The image must visually embody the scene titled: '{scene_title}'. "
        f"Core visual elements and keywords: {base_keywords}. "
        f"{character_narrative} "
        f"Contextual narrative for the visual: The setting is '{setting_desc_context}'. The key moment is '{action_desc_context}'. "
        f"Artistic Direction -- Overall Visual Style and Mood: {final_style_directive}. "
        f"Cinematography -- Camera Framing and Perspective for a still image: {director_camera}. " # Emphasize still image context
        f"Emotional Impact: Convey a strong sense of '{emotional_beat_context}'. "
        f"Technical Execution: Render with extreme detail, sophisticated lighting (e.g., dramatic rim lighting, soft diffused light, harsh contrasts), rich textures, palpable atmospheric effects (e.g., mist, dust, lens flares, rain). "
        f"The final image should be of exceptional quality, suitable for a major film production's visual development. "
        f"Ensure all specified characters are distinct and adhere to their descriptions if provided. Still image, no motion implied unless essential to the pose."
    )
    return " ".join(prompt.split()) # Normalize whitespace


def construct_text_to_video_prompt(scene_data, character_definitions=None, global_style_additions="", seed_image_path=None):
    """
    Constructs a prompt for a text-to-video (or image-to-video) generation model
    like Runway Gen-1/Gen-2.
    """
    scene_title = scene_data.get('scene_title', 'A dynamic scene')
    # Keywords can be used for styling and core elements
    base_keywords = scene_data.get('image_generation_keywords_감독', 'cinematic video clip')
    setting_desc = scene_data.get('setting_description', '')
    plot_beat = scene_data.get('key_plot_beat', '')
    motion_desc = scene_data.get('video_clip_motion_description_감독', 'subtle ambient motion')
    visual_style = scene_data.get('PROACTIVE_visual_style_감독', '')
    camera_work = scene_data.get('PROACTIVE_camera_work_감독', 'dynamic camera movement')
    emotional_beat = scene_data.get('emotional_beat', scene_title)

    current_scene_character_details = []
    characters_involved = scene_data.get('characters_involved', [])
    if characters_involved:
        for char_name in characters_involved:
            char_lookup_key = char_name.strip().lower()
            if character_definitions and char_lookup_key in character_definitions:
                current_scene_character_details.append(f"{char_name.strip()} (as: {character_definitions[char_lookup_key]})")
            else:
                current_scene_character_details.append(char_name.strip())
    
    character_narrative = ""
    if current_scene_character_details:
        character_narrative = f"Characters involved: {', '.join(current_scene_character_details)}. Their appearance and actions should be central."

    final_style_directive = visual_style
    if global_style_additions:
        final_style_directive = f"{visual_style}. Global style notes: {global_style_additions}."

    prompt_parts = [
        f"Generate a highly cinematic video clip for a scene titled '{scene_title}'.",
        f"Setting: {setting_desc}.",
        f"Key moment: {plot_beat}.",
        character_narrative if character_narrative else "Focus on the environment and atmosphere if no specific characters are detailed.",
        f"Primary Motion: {motion_desc}. This motion should be the central dynamic element of the clip.",
        f"Visual Style & Mood: {final_style_directive}. Infuse with a strong sense of '{emotional_beat}'.",
        f"Cinematography: Implement camera work described as '{camera_work}'. If specific shots like 'dolly zoom' or 'tracking shot' are mentioned, execute them clearly.",
        f"Core visual keywords for styling and content: {base_keywords}.",
        "The video should be photorealistic, with extreme detail, sophisticated lighting, rich textures, and palpable atmospheric effects.",
        "Ensure high fidelity and smooth motion. The clip should feel like a shot from a major film production."
    ]

    if seed_image_path:
        prompt_parts.append(f"Use the provided seed image at '{seed_image_path}' to guide the visual style, composition, and content of the video, while incorporating the described motion.")
    else:
        prompt_parts.append("Generate the video from text description only, creating all visual elements based on the prompt.")

    return " ".join(" ".join(prompt_parts).split()) # Normalize whitespace


def create_narration_script_prompt_enhanced(story_scenes_data, overall_mood, overall_genre, voice_style="cinematic_trailer"):
    scenes_summary = []
    for i, scene in enumerate(story_scenes_data):
        scenes_summary.append(
            f"Scene {scene.get('scene_number', i+1)} (Title: '{scene.get('scene_title','Untitled')}', Beat: '{scene.get('emotional_beat','N/A')}', Asset: {scene.get('suggested_asset_type_감독','image')}):\n"
            f"  Setting: {scene.get('setting_description','')}\n"
            f"  Plot Beat: {scene.get('key_plot_beat','')}\n"
            f"  Character Focus: {scene.get('character_focus_moment','(general atmosphere)')}\n"
            f"  Dialogue Hook: {scene.get('suggested_dialogue_hook','(none)')}\n"
            f"  Director's Sound Hint: {scene.get('PROACTIVE_sound_design_감독','')}"
        )
    full_summary_text = "\n\n".join(scenes_summary)

    voice_style_description = {
        "cinematic_trailer": "deep, resonant, slightly epic, building anticipation, and authoritative.",
        "documentary_neutral": "clear, informative, objective, and well-paced.",
        "introspective_character": f"reflective, personal, possibly first-person, echoing the thoughts of a key character."
    }[voice_style]

    prompt = f"""
    You are an award-winning voiceover scriptwriter for a cinematic animatic.
    The animatic uses a mix of still images and short video clips, based on these scene treatments:

    --- SCENE TREATMENTS ---
    {full_summary_text}
    --- END SCENE TREATMENTS ---

    Overall Genre: {overall_genre}
    Overall Mood: {overall_mood}
    Desired Voiceover Style: {voice_style} (Characteristics: {voice_style_description})

    Your narration script should:
    - Weave a cohesive narrative through all scenes.
    - Enhance emotional impact, drawing from 'emotional_beat', 'character_focus_moment', and 'sound_hint'.
    - Be concise: 1-3 impactful sentences per scene. Total for {len(story_scenes_data)} scenes: approx {len(story_scenes_data) * 15}-{len(story_scenes_data) * 25} words.
    - Transcend simple description; offer insight, build tension/emotion, evoke thematic depth.
    - If 'introspective_character', write from one prominent character's perspective.
    - The output MUST be ONLY the narration script text, ready for text-to-speech. No scene numbers, titles, or directives like "(Voiceover)".

    Example (different story):
    "Dust motes danced in the lone shaft of light. Each step echoed, a countdown. The air grew colder, heavy with ozone and ancient despair..."

    Craft your narration.
    """
    return " ".join(prompt.split())


def create_scene_regeneration_prompt(original_scene_data, user_feedback, full_story_context=None):
    context_str = f"Original scene (Scene Number {original_scene_data.get('scene_number')} - Title: {original_scene_data.get('scene_title')}):\n{json.dumps(original_scene_data, indent=2)}\n\n"
    if full_story_context:
        context_str += f"Full story context for reference (abbreviated):\n"
        for i, scene_ctx in enumerate(full_story_context):
            context_str += f"  Scene {scene_ctx.get('scene_number', i+1)} Title: {scene_ctx.get('scene_title', 'Untitled')}, Plot: {scene_ctx.get('key_plot_beat', '')[:50]}...\n"
        context_str += "\n"


    return f"""
    You are an AI Script Supervisor and Creative Consultant.
    {context_str}
    User Feedback for this scene: "{user_feedback}"

    Regenerate ONLY the JSON object for this single scene, incorporating the feedback.
    Maintain the exact 16 field structure:
    (scene_number, scene_title, emotional_beat, setting_description, characters_involved, character_focus_moment, key_plot_beat, suggested_dialogue_hook, PROACTIVE_visual_style_감독, PROACTIVE_camera_work_감독, PROACTIVE_sound_design_감독, suggested_asset_type_감독, video_clip_motion_description_감독, video_clip_duration_estimate_secs_감독, image_generation_keywords_감독, pexels_search_query_감독).

    Key considerations:
    - 'scene_number' MUST NOT change.
    - 'key_plot_beat' should be a concise descriptive sentence (max 15-20 words).
    - 'image_generation_keywords_감독' should be updated to reflect any visual changes.
    - 'pexels_search_query_감독' should also be updated if the setting or mood changes significantly.
    - If feedback implies changes to motion or asset type, update 'suggested_asset_type_감독', 'video_clip_motion_description_감독', and 'video_clip_duration_estimate_secs_감독' accordingly. For 'video_clip_motion_description_감독', if type is 'image', set to "N/A". For 'video_clip_duration_estimate_secs_감독', if type is 'image', set to 0.
    - If the user's feedback implies experimental narrative changes and the original scene had a `director_note`, you may update or remove it. If introducing a new experimental twist, add/update the `director_note` field.

    Output only the single, updated JSON object for this scene.
    """

def create_visual_regeneration_prompt(original_dalle_prompt, user_feedback, scene_data, character_definitions=None, global_style_additions=""):
    """
    Creates a prompt for Gemini to refine an existing DALL-E prompt for an IMAGE.
    """
    characters_involved_in_scene = scene_data.get('characters_involved', [])
    current_scene_character_details = []
    if characters_involved_in_scene:
        for char_name_from_scene in characters_involved_in_scene:
            char_name_clean = char_name_from_scene.strip(); char_lookup_key = char_name_clean.lower()
            if character_definitions and char_lookup_key in character_definitions:
                current_scene_character_details.append(f"{char_name_clean} (described as: {character_definitions[char_lookup_key]})")
            else: current_scene_character_details.append(char_name_clean)
    characters_narrative = f" Characters to feature: {', '.join(current_scene_character_details) if current_scene_character_details else 'None specifically detailed'}."

    full_prompt_for_gemini = f"""
    You are an AI Art Director specializing in refining DALL-E 3 prompts for cinematic STILL IMAGES.
    The goal is to update an image prompt based on user feedback.

    Scene Context:
    - Title: "{scene_data.get('scene_title', '')}"
    - Setting: "{scene_data.get('setting_description', '')}"
    - Key Plot Beat: "{scene_data.get('key_plot_beat', '')}"
    - {characters_narrative}
    - Director's Suggested Visual Style: "{scene_data.get('PROACTIVE_visual_style_감독', '')}"
    - Director's Suggested Camera: "{scene_data.get('PROACTIVE_camera_work_감독', '')}"
    - Current Global Style Additions: "{global_style_additions}"

    The PREVIOUS DALL-E 3 prompt (that generated the image the user wants to change) was:
    "{original_dalle_prompt}"

    User Feedback on the visual generated by the previous prompt:
    "{user_feedback}"

    Your Task: Generate a NEW, revised DALL-E 3 prompt specifically for a STILL IMAGE.
    This new prompt must incorporate the user's feedback to achieve the desired visual changes.
    It should remain ultra-detailed, photorealistic, and highly cinematic.
    The prompt should guide DALL-E 3 to create a stunning still image suitable for a film's concept art.
    Maintain core scene elements (setting, characters, plot beat) unless feedback explicitly asks to change them.
    Translate feedback into concrete visual descriptions for a static image (lighting, color, composition, character appearance/pose, atmosphere).
    Reinforce character descriptions from the context if they are relevant to the feedback.
    The prompt should be a single block of text.

    Output ONLY the new, revised DALL-E 3 prompt string. Do not add any other text before or after the prompt.
    """
    return " ".join(full_prompt_for_gemini.split())