Spaces:

MCP-1st-Birthday
/

AIQuoteClipGenerator

Running

App Files Files Community

ladybug11 commited on 18 days ago

Commit

9abead4

1 Parent(s): 97f8628

update

Browse files

Files changed (2) hide show

README.md +108 -2
app.py +92 -91

README.md CHANGED Viewed

@@ -10,7 +10,113 @@ pinned: false
 license: apache-2.0
 short_description: AI-powered tool that automatically generates quote video
 tags:
-  - mcp-in-action-track-creative
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 license: apache-2.0
 short_description: AI-powered tool that automatically generates quote video
 tags:
+  - mcp-in-action-track-consumer
 ---
+## 🎬 AI Quote Clip Generator
+Autonomous MCP Agent • Trend-Aware Quote Studio • Multimodal Generation
+AI Quote Clip Generator is an MCP-powered autonomous system that creates aesthetic, trend-aware quote videos for TikTok, Instagram Reels, and Shorts.
+It combines Gemini + OpenAI + ElevenLabs + Modal + Pexels into a single intelligent pipeline that plans, generates, narrates, and renders short-form content automatically.
+This project is built for the MCP 1st Birthday Hackathon – Track 2 (MCP in Action / Productivity).
+ ### 🔮 Live Demo
+🚀 What It Does
+- With a single click, the system:
+- Generates non-repetitive Gemini-powered quotes
+- Applies a persona style (Coach, Philosopher, Poet, Mentor)
+- Incorporates trend-aware context for modern content themes
+- Creates voice-over explanations using OpenAI + ElevenLabs
+- Retrieves cinematic vertical stock footage from Pexels
+- Renders 7–20 second short-form videos via Modal
+- Saves the results to a live gallery inside the app
+- Displays a full agent activity log of each step
+This turns the tool into a full AI content studio optimized for social platforms.
+## 🛠️ MCP Tools Used
+The project exposes multiple tools via MCP:
+| Tool | Description |
+|------|-------------|
+| **generate_quote_tool** | Produces unique, trend-aware quotes using Gemini with per-niche memory |
+| **search_pexels_video_tool** | Retrieves aesthetic background videos from Pexels |
+| **create_quote_video_tool** | Sends jobs to Modal to render final 7–20s clips |
+| *(internal)* `generate_voice_commentary` | Generates 25–35 word explanations (OpenAI + ElevenLabs) |
+These tools are orchestrated autonomously through a multi-step agent chain.
+---
+## 📊 Agent Pipeline Overview
+1. Build context → niche + persona + trend theme
+2. Generate quote (Gemini primary, OpenAI fallback)
+3. Create voice-over commentary (OpenAI + ElevenLabs)
+4. Retrieve video footage (Pexels)
+5. Render the final video (Modal)
+6. Save and display in the gallery
+----
+## 🧩 Core Components
+###  **1. AUTONOMOUS MCP AGENT PIPELINE**
+A multi-step reasoning pipeline built with smolagents that orchestrates the full workflow:
+trend-aware context building → quote generation → narration → video retrieval → rendering → gallery update.
+###  **2. Gemini-Enhanced Quote Generator (Variety Safe)**
+A hybrid Gemini/OpenAI system with per-niche memory and variety tracking to ensure every quote is unique, non-repetitive, and aligned with current social trends.
+### 3. **Trend-Aware Mini-RAG Engine**
+A lightweight "mini-RAG" system embeds niche-specific trend intelligence (e.g., Soft Life, Discipline Era, Glow-Up, Reset Culture). The agent proactively retrieves and fuses these trend insights—hooks, metaphors, and persona voice—into quotes and commentaries for contextual freshness.
+### 4. **ElevenLabs Voice Studio**
+Automatically generates voice-over explanations for every video, using OpenAI for spoken-style commentary creation and ElevenLabs for lifelike narration. Provides a selection of realistic voices.
+### 5. **Modal Render Engine (Fast Video Processing)**
+All final short-form clips are rendered through a Modal cloud function, synchronizing narration length, animated text, and cinematic video overlays for rapid production.
+### 6. **Pexels Multimodal Search Tool**
+Harnesses the Pexels video API via an agent tool to fetch vertical cinematic backgrounds tailored to each niche, persona, and trending topic (e.g., “soft morning light,” “discipline era routines”).
+### 7. **Dynamic Aesthetic Text Layouts**
+Offers three distinct text styles—Classic Center, Lower-Third Serif, and Typewriter Top—based on high-performing TikTok aesthetics, optimizing for visual variety.
+### 8. **Persistent Video Gallery**
+Saves every generated video to a scrollable gallery inside the app, letting creators browse their entire history of AI-generated clips.
+### 🧑‍💻 Authors
+- Meheret Egzerab
+---
+### 📝 License
+This project is licensed under the apache-2.0 License.

app.py CHANGED Viewed

@@ -204,7 +204,6 @@ def get_trend_insights(niche: str) -> Dict[str, Any]:
         },
     }
-    # Default fallback
     default = {
         "label": "modern glow-up & gentle discipline",
         "summary": (
@@ -520,14 +519,46 @@ agent, agent_error = initialize_agent()
 # ==== VOICE GENERATION (OpenAI explanation + ElevenLabs TTS) ==================
 def generate_voice_commentary(
     quote_text: str,
     niche: str,
     persona: str,
     trend_label: str,
 ) -> Tuple[str, str]:
     """
     Generate a short explanatory commentary + ElevenLabs audio (as base64).
     Returns:
         (commentary_text, audio_b64) – audio_b64 may be "" if error.
@@ -577,16 +608,13 @@ Return ONLY the commentary text, nothing else.
     # 2) ElevenLabs TTS
     try:
         audio_stream = elevenlabs_client.text_to_speech.convert(
             text=commentary,
-            voice_id="pNInz6obpgDQGcFmaJgB",  # Adam
             model_id="eleven_multilingual_v2",
-            voice_settings=VoiceSettings(
-                stability=0.6,
-                similarity_boost=0.8,
-                style=0.6,
-                use_speaker_boost=True,
-            ),
         )
         audio_bytes = b"".join(chunk for chunk in audio_stream)
@@ -605,15 +633,15 @@ def mcp_agent_pipeline(
     style: str,
     persona: str,
     text_style: str,
     num_variations: int = 1,
-    voice_enabled: bool = True,
 ) -> Tuple[str, List[str]]:
     """
     MCP-flavored autonomous pipeline with:
     - Context engineering (persona, trends)
     - Trend-informed 'RAG' context injection
     - Quote generation via hybrid Gemini/OpenAI
-    - Optional ElevenLabs narration
     - Modal-based video creation (1–3 variations)
     """
@@ -630,7 +658,7 @@ def mcp_agent_pipeline(
     status_log.append(f"   • Visual style: `{style}`")
     status_log.append(f"   • Persona: `{persona}`")
     status_log.append(f"   • Text layout: `{text_style}`")
-    status_log.append(f"   • Voice-over: {'ON' if voice_enabled else 'OFF'}\n")
     trend_info = get_trend_insights(niche)
     trend_label = trend_info.get("label", "")
@@ -659,24 +687,21 @@ def mcp_agent_pipeline(
     preview = quote if len(quote) <= 140 else quote[:140] + "..."
     status_log.append(f"   ✅ Quote: “{preview}”\n")
-    # STEP 3: Optional voice commentary
-    audio_b64 = ""
-    if voice_enabled and elevenlabs_client:
-        status_log.append("🔊 **Step 3 – Generating voice-over explanation (OpenAI + ElevenLabs)**")
-        commentary, audio_b64 = generate_voice_commentary(
-            quote_text=quote,
-            niche=niche,
-            persona=persona,
-            trend_label=trend_label,
-        )
-        if audio_b64:
-            status_log.append("   ✅ Voice-over created and encoded as base64")
-        else:
-            status_log.append("   ⚠️ Voice generation failed; continuing without audio")
-        if commentary:
-            status_log.append(f"   📝 Commentary preview: {commentary[:120]}...\n")
     else:
-        status_log.append("🔇 **Step 3 – Voice-over skipped** (disabled or missing ElevenLabs)\n")
     # STEP 4: Search Pexels videos
     status_log.append("🎥 **Step 4 – Searching Pexels for background videos**")
@@ -731,7 +756,7 @@ def mcp_agent_pipeline(
             created_videos.append(out_path)
             status_log.append(f"   ✅ Variation {i+1} rendered successfully")
-            # Copy to gallery
             gallery_filename = f"gallery_{timestamp}_v{i+1}.mp4"
             gallery_path = os.path.join(gallery_dir, gallery_filename)
             try:
@@ -764,11 +789,14 @@ def mcp_agent_pipeline(
     return "\n".join(status_log), created_videos
-# ==== GALLERY UTIL ============================================================
 def load_gallery_videos() -> List[str]:
-    """Load up to 6 most recent videos from persistent gallery folder."""
     gallery_output_dir = "/data/gallery_videos"
     os.makedirs(gallery_output_dir, exist_ok=True)
@@ -778,14 +806,9 @@ def load_gallery_videos() -> List[str]:
         glob.glob(f"{gallery_output_dir}/*.mp4"),
         key=os.path.getmtime,
         reverse=True,
-    )[:6]
-    videos: List[str] = [None] * 6  # type: ignore
-    for i, video_path in enumerate(existing_videos):
-        if i < 6:
-            videos[i] = video_path
-    return videos
 # ==== GRADIO UI ===============================================================
@@ -799,28 +822,21 @@ with gr.Blocks(
     # 🎬 AIQuoteClipGenerator
     ### MCP-flavored agent • Gemini + OpenAI + ElevenLabs + Modal
-    **What it does:**
-    - 🧠 Generates **non-repetitive quotes** using Gemini with variety tracking
-    - 📈 Uses **trend-aware context** per niche (mini-RAG style)
-    - 🎭 Applies a **persona** to shape the tone (coach / philosopher / poet / mentor)
-    - 🔊 Optional **ElevenLabs voice-over** explaining the quote
-    - 🎥 Pulls vertical stock videos from **Pexels**
-    - ⚡ Renders final clips via **Modal**, 1–3 variations
     """
     )
-    with gr.Accordion("📸 Example Gallery – Recent Videos", open=True):
-        gr.Markdown("Auto-updated with your latest generated clips.")
-        with gr.Row():
-            gallery_video1 = gr.Video(height=260, show_label=False)
-            gallery_video2 = gr.Video(height=260, show_label=False)
-            gallery_video3 = gr.Video(height=260, show_label=False)
-        with gr.Row():
-            gallery_video4 = gr.Video(height=260, show_label=False)
-            gallery_video5 = gr.Video(height=260, show_label=False)
-            gallery_video6 = gr.Video(height=260, show_label=False)
     gr.Markdown("---")
     gr.Markdown("## 🎯 Generate Your Own Quote Video")
@@ -871,9 +887,13 @@ with gr.Blocks(
                 value="classic_center",
             )
-            voice_enabled = gr.Checkbox(
-                label="🔊 Add voice-over explanation (ElevenLabs)",
-                value=True,
             )
             num_variations = gr.Slider(
@@ -897,7 +917,7 @@ with gr.Blocks(
                 show_label=False,
             )
-    gr.Markdown("### ✨ Your Quote Videos")
     with gr.Row():
         video1 = gr.Video(label="Video 1", height=480)
         video2 = gr.Video(label="Video 2", height=480)
@@ -907,12 +927,10 @@ with gr.Blocks(
         """
     ---
     ### 🧩 Under the hood
-    - **Context engineering:** niche + persona + trend theme
-    - **Mini-RAG:** recent trends per niche feeding into generation prompts
-    - **Hybrid LLM:** Gemini (quotes) + OpenAI (commentary)
-    - **Multimodal pipeline:** text ➜ audio ➜ video composition
-    Built for the MCP 1st Birthday Hackathon – Track 2 (MCP in Action, Productivity).
     """
     )
@@ -921,16 +939,16 @@ with gr.Blocks(
         style_val,
         persona_val,
         text_style_val,
         num_variations_val,
-        voice_enabled_val,
     ):
         status, videos = mcp_agent_pipeline(
             niche=niche_val,
             style=style_val,
             persona=persona_val,
             text_style=text_style_val,
             num_variations=int(num_variations_val),
-            voice_enabled=bool(voice_enabled_val),
         )
         v1 = videos[0] if len(videos) > 0 else None
@@ -944,12 +962,7 @@ with gr.Blocks(
             v1,
             v2,
             v3,
-            gallery_vids[0],
-            gallery_vids[1],
-            gallery_vids[2],
-            gallery_vids[3],
-            gallery_vids[4],
-            gallery_vids[5],
         ]
     generate_btn.click(
@@ -959,35 +972,23 @@ with gr.Blocks(
             style,
             persona,
             text_style,
             num_variations,
-            voice_enabled,
         ],
         outputs=[
             output,
             video1,
             video2,
             video3,
-            gallery_video1,
-            gallery_video2,
-            gallery_video3,
-            gallery_video4,
-            gallery_video5,
-            gallery_video6,
         ],
     )
     # Load gallery on page load
     demo.load(
         load_gallery_videos,
-        outputs=[
-            gallery_video1,
-            gallery_video2,
-            gallery_video3,
-            gallery_video4,
-            gallery_video5,
-            gallery_video6,
-        ],
     )
 if __name__ == "__main__":
-    demo.launch(allowed_paths=["/data/gallery_videos"])

         },
     }
     default = {
         "label": "modern glow-up & gentle discipline",
         "summary": (
 # ==== VOICE GENERATION (OpenAI explanation + ElevenLabs TTS) ==================
+def get_voice_config(voice_profile: str) -> Tuple[str, VoiceSettings]:
+    """
+    Map a human-readable voice profile to an ElevenLabs voice_id + settings.
+    """
+    vp = (voice_profile or "").lower()
+    # Calm female (Rachel)
+    if "rachel" in vp or "female" in vp:
+        return (
+            "21m00Tcm4TlvDq8ikWAM",  # Rachel (from ElevenLabs docs)
+            VoiceSettings(
+                stability=0.5,
+                similarity_boost=0.9,
+                style=0.4,
+                use_speaker_boost=True,
+            ),
+        )
+    # Warm male (Adam)
+    return (
+        "pNInz6obpgDQGcFmaJgB",  # Adam
+        VoiceSettings(
+            stability=0.6,
+            similarity_boost=0.8,
+            style=0.5,
+            use_speaker_boost=True,
+        ),
+    )
 def generate_voice_commentary(
     quote_text: str,
     niche: str,
     persona: str,
     trend_label: str,
+    voice_profile: str,
 ) -> Tuple[str, str]:
     """
     Generate a short explanatory commentary + ElevenLabs audio (as base64).
+    Voice is always generated if ElevenLabs is available.
     Returns:
         (commentary_text, audio_b64) – audio_b64 may be "" if error.
     # 2) ElevenLabs TTS
     try:
+        voice_id, voice_settings = get_voice_config(voice_profile)
         audio_stream = elevenlabs_client.text_to_speech.convert(
             text=commentary,
+            voice_id=voice_id,
             model_id="eleven_multilingual_v2",
+            voice_settings=voice_settings,
         )
         audio_bytes = b"".join(chunk for chunk in audio_stream)
     style: str,
     persona: str,
     text_style: str,
+    voice_profile: str,
     num_variations: int = 1,
 ) -> Tuple[str, List[str]]:
     """
     MCP-flavored autonomous pipeline with:
     - Context engineering (persona, trends)
     - Trend-informed 'RAG' context injection
     - Quote generation via hybrid Gemini/OpenAI
+    - ElevenLabs narration (always on if available)
     - Modal-based video creation (1–3 variations)
     """
     status_log.append(f"   • Visual style: `{style}`")
     status_log.append(f"   • Persona: `{persona}`")
     status_log.append(f"   • Text layout: `{text_style}`")
+    status_log.append(f"   • Voice profile: `{voice_profile}`\n")
     trend_info = get_trend_insights(niche)
     trend_label = trend_info.get("label", "")
     preview = quote if len(quote) <= 140 else quote[:140] + "..."
     status_log.append(f"   ✅ Quote: “{preview}”\n")
+    # STEP 3: Voice commentary (always attempted)
+    status_log.append("🔊 **Step 3 – Generating voice-over explanation (OpenAI + ElevenLabs)**")
+    commentary, audio_b64 = generate_voice_commentary(
+        quote_text=quote,
+        niche=niche,
+        persona=persona,
+        trend_label=trend_label,
+        voice_profile=voice_profile,
+    )
+    if audio_b64:
+        status_log.append("   ✅ Voice-over created and encoded as base64")
     else:
+        status_log.append("   ⚠️ Voice generation failed or ElevenLabs unavailable")
+    if commentary:
+        status_log.append(f"   📝 Commentary preview: {commentary[:120]}...\n")
     # STEP 4: Search Pexels videos
     status_log.append("🎥 **Step 4 – Searching Pexels for background videos**")
             created_videos.append(out_path)
             status_log.append(f"   ✅ Variation {i+1} rendered successfully")
+            # Copy to gallery (we keep ALL; scrolling handled by Gradio gallery)
             gallery_filename = f"gallery_{timestamp}_v{i+1}.mp4"
             gallery_path = os.path.join(gallery_dir, gallery_filename)
             try:
     return "\n".join(status_log), created_videos
+# ==== GALLERY UTIL (SCROLLABLE, KEEPS ALL) ====================================
 def load_gallery_videos() -> List[str]:
+    """
+    Load all videos from persistent gallery folder (sorted newest → oldest).
+    Gradio's Gallery will handle scrolling.
+    """
     gallery_output_dir = "/data/gallery_videos"
     os.makedirs(gallery_output_dir, exist_ok=True)
         glob.glob(f"{gallery_output_dir}/*.mp4"),
         key=os.path.getmtime,
         reverse=True,
+    )
+    return existing_videos
 # ==== GRADIO UI ===============================================================
     # 🎬 AIQuoteClipGenerator
     ### MCP-flavored agent • Gemini + OpenAI + ElevenLabs + Modal
+    An autonomous mini-studio that generates trend-aware quote videos with voice-over,
+    cinematic stock footage, and MCP-style agent reasoning.
     """
     )
+    with gr.Accordion("📸 Example Gallery – All Generated Videos", open=True):
+        gr.Markdown("Scroll to explore all the clips you've generated so far.")
+        gallery = gr.Gallery(
+            label=None,
+            elem_id="gallery",
+            show_label=False,
+            columns=3,
+            height=540,
+            preview=True,
+        )
     gr.Markdown("---")
     gr.Markdown("## 🎯 Generate Your Own Quote Video")
                 value="classic_center",
             )
+            voice_profile = gr.Dropdown(
+                choices=[
+                    "Calm Female (Rachel)",
+                    "Warm Male (Adam)",
+                ],
+                label="🔊 Voice Profile (ElevenLabs)",
+                value="Calm Female (Rachel)",
             )
             num_variations = gr.Slider(
                 show_label=False,
             )
+    gr.Markdown("### ✨ Your Quote Videos (This Run)")
     with gr.Row():
         video1 = gr.Video(label="Video 1", height=480)
         video2 = gr.Video(label="Video 2", height=480)
         """
     ---
     ### 🧩 Under the hood
+    - Context engineering: niche + persona + trend theme
+    - Mini-RAG: curated trend knowledge feeding into generation
+    - Hybrid LLM: Gemini (quotes) + OpenAI (commentary)
+    - Multimodal pipeline: text → audio → video
     """
     )
         style_val,
         persona_val,
         text_style_val,
+        voice_profile_val,
         num_variations_val,
     ):
         status, videos = mcp_agent_pipeline(
             niche=niche_val,
             style=style_val,
             persona=persona_val,
             text_style=text_style_val,
+            voice_profile=voice_profile_val,
             num_variations=int(num_variations_val),
         )
         v1 = videos[0] if len(videos) > 0 else None
             v1,
             v2,
             v3,
+            gallery_vids,
         ]
     generate_btn.click(
             style,
             persona,
             text_style,
+            voice_profile,
             num_variations,
         ],
         outputs=[
             output,
             video1,
             video2,
             video3,
+            gallery,
         ],
     )
     # Load gallery on page load
     demo.load(
         load_gallery_videos,
+        outputs=[gallery],
     )
 if __name__ == "__main__":
+    demo.launch(allowed_paths=["/data/gallery_videos"])