ladybug11 commited on
Commit
9abead4
·
1 Parent(s): 97f8628
Files changed (2) hide show
  1. README.md +108 -2
  2. app.py +92 -91
README.md CHANGED
@@ -10,7 +10,113 @@ pinned: false
10
  license: apache-2.0
11
  short_description: AI-powered tool that automatically generates quote video
12
  tags:
13
- - mcp-in-action-track-creative
14
  ---
15
 
16
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  license: apache-2.0
11
  short_description: AI-powered tool that automatically generates quote video
12
  tags:
13
+ - mcp-in-action-track-consumer
14
  ---
15
 
16
+ ## 🎬 AI Quote Clip Generator
17
+
18
+ Autonomous MCP Agent • Trend-Aware Quote Studio • Multimodal Generation
19
+
20
+ AI Quote Clip Generator is an MCP-powered autonomous system that creates aesthetic, trend-aware quote videos for TikTok, Instagram Reels, and Shorts.
21
+ It combines Gemini + OpenAI + ElevenLabs + Modal + Pexels into a single intelligent pipeline that plans, generates, narrates, and renders short-form content automatically.
22
+
23
+ This project is built for the MCP 1st Birthday Hackathon – Track 2 (MCP in Action / Productivity).
24
+
25
+ ### 🔮 Live Demo
26
+
27
+
28
+ 🚀 What It Does
29
+
30
+ - With a single click, the system:
31
+
32
+ - Generates non-repetitive Gemini-powered quotes
33
+
34
+ - Applies a persona style (Coach, Philosopher, Poet, Mentor)
35
+
36
+ - Incorporates trend-aware context for modern content themes
37
+
38
+ - Creates voice-over explanations using OpenAI + ElevenLabs
39
+
40
+ - Retrieves cinematic vertical stock footage from Pexels
41
+
42
+ - Renders 7–20 second short-form videos via Modal
43
+
44
+ - Saves the results to a live gallery inside the app
45
+
46
+ - Displays a full agent activity log of each step
47
+
48
+ This turns the tool into a full AI content studio optimized for social platforms.
49
+
50
+
51
+ ## 🛠️ MCP Tools Used
52
+
53
+ The project exposes multiple tools via MCP:
54
+
55
+ | Tool | Description |
56
+ |------|-------------|
57
+ | **generate_quote_tool** | Produces unique, trend-aware quotes using Gemini with per-niche memory |
58
+ | **search_pexels_video_tool** | Retrieves aesthetic background videos from Pexels |
59
+ | **create_quote_video_tool** | Sends jobs to Modal to render final 7–20s clips |
60
+ | *(internal)* `generate_voice_commentary` | Generates 25–35 word explanations (OpenAI + ElevenLabs) |
61
+
62
+ These tools are orchestrated autonomously through a multi-step agent chain.
63
+
64
+ ---
65
+
66
+ ## 📊 Agent Pipeline Overview
67
+
68
+ 1. Build context → niche + persona + trend theme
69
+ 2. Generate quote (Gemini primary, OpenAI fallback)
70
+ 3. Create voice-over commentary (OpenAI + ElevenLabs)
71
+ 4. Retrieve video footage (Pexels)
72
+ 5. Render the final video (Modal)
73
+ 6. Save and display in the gallery
74
+
75
+ ----
76
+ ## 🧩 Core Components
77
+
78
+ ### **1. AUTONOMOUS MCP AGENT PIPELINE**
79
+
80
+ A multi-step reasoning pipeline built with smolagents that orchestrates the full workflow:
81
+ trend-aware context building → quote generation → narration → video retrieval → rendering → gallery update.
82
+
83
+ ### **2. Gemini-Enhanced Quote Generator (Variety Safe)**
84
+
85
+ A hybrid Gemini/OpenAI system with per-niche memory and variety tracking to ensure every quote is unique, non-repetitive, and aligned with current social trends.
86
+
87
+
88
+ ### 3. **Trend-Aware Mini-RAG Engine**
89
+
90
+ A lightweight "mini-RAG" system embeds niche-specific trend intelligence (e.g., Soft Life, Discipline Era, Glow-Up, Reset Culture). The agent proactively retrieves and fuses these trend insights—hooks, metaphors, and persona voice—into quotes and commentaries for contextual freshness.
91
+
92
+ ### 4. **ElevenLabs Voice Studio**
93
+
94
+ Automatically generates voice-over explanations for every video, using OpenAI for spoken-style commentary creation and ElevenLabs for lifelike narration. Provides a selection of realistic voices.
95
+
96
+ ### 5. **Modal Render Engine (Fast Video Processing)**
97
+
98
+ All final short-form clips are rendered through a Modal cloud function, synchronizing narration length, animated text, and cinematic video overlays for rapid production.
99
+
100
+ ### 6. **Pexels Multimodal Search Tool**
101
+
102
+ Harnesses the Pexels video API via an agent tool to fetch vertical cinematic backgrounds tailored to each niche, persona, and trending topic (e.g., “soft morning light,” “discipline era routines”).
103
+
104
+ ### 7. **Dynamic Aesthetic Text Layouts**
105
+
106
+ Offers three distinct text styles—Classic Center, Lower-Third Serif, and Typewriter Top—based on high-performing TikTok aesthetics, optimizing for visual variety.
107
+
108
+ ### 8. **Persistent Video Gallery**
109
+
110
+ Saves every generated video to a scrollable gallery inside the app, letting creators browse their entire history of AI-generated clips.
111
+
112
+
113
+ ### 🧑‍💻 Authors
114
+
115
+ - Meheret Egzerab
116
+
117
+
118
+ ---
119
+ ### 📝 License
120
+
121
+ This project is licensed under the apache-2.0 License.
122
+
app.py CHANGED
@@ -204,7 +204,6 @@ def get_trend_insights(niche: str) -> Dict[str, Any]:
204
  },
205
  }
206
 
207
- # Default fallback
208
  default = {
209
  "label": "modern glow-up & gentle discipline",
210
  "summary": (
@@ -520,14 +519,46 @@ agent, agent_error = initialize_agent()
520
  # ==== VOICE GENERATION (OpenAI explanation + ElevenLabs TTS) ==================
521
 
522
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
523
  def generate_voice_commentary(
524
  quote_text: str,
525
  niche: str,
526
  persona: str,
527
  trend_label: str,
 
528
  ) -> Tuple[str, str]:
529
  """
530
  Generate a short explanatory commentary + ElevenLabs audio (as base64).
 
531
 
532
  Returns:
533
  (commentary_text, audio_b64) – audio_b64 may be "" if error.
@@ -577,16 +608,13 @@ Return ONLY the commentary text, nothing else.
577
 
578
  # 2) ElevenLabs TTS
579
  try:
 
 
580
  audio_stream = elevenlabs_client.text_to_speech.convert(
581
  text=commentary,
582
- voice_id="pNInz6obpgDQGcFmaJgB", # Adam
583
  model_id="eleven_multilingual_v2",
584
- voice_settings=VoiceSettings(
585
- stability=0.6,
586
- similarity_boost=0.8,
587
- style=0.6,
588
- use_speaker_boost=True,
589
- ),
590
  )
591
 
592
  audio_bytes = b"".join(chunk for chunk in audio_stream)
@@ -605,15 +633,15 @@ def mcp_agent_pipeline(
605
  style: str,
606
  persona: str,
607
  text_style: str,
 
608
  num_variations: int = 1,
609
- voice_enabled: bool = True,
610
  ) -> Tuple[str, List[str]]:
611
  """
612
  MCP-flavored autonomous pipeline with:
613
  - Context engineering (persona, trends)
614
  - Trend-informed 'RAG' context injection
615
  - Quote generation via hybrid Gemini/OpenAI
616
- - Optional ElevenLabs narration
617
  - Modal-based video creation (1–3 variations)
618
  """
619
 
@@ -630,7 +658,7 @@ def mcp_agent_pipeline(
630
  status_log.append(f" • Visual style: `{style}`")
631
  status_log.append(f" • Persona: `{persona}`")
632
  status_log.append(f" • Text layout: `{text_style}`")
633
- status_log.append(f" • Voice-over: {'ON' if voice_enabled else 'OFF'}\n")
634
 
635
  trend_info = get_trend_insights(niche)
636
  trend_label = trend_info.get("label", "")
@@ -659,24 +687,21 @@ def mcp_agent_pipeline(
659
  preview = quote if len(quote) <= 140 else quote[:140] + "..."
660
  status_log.append(f" ✅ Quote: “{preview}”\n")
661
 
662
- # STEP 3: Optional voice commentary
663
- audio_b64 = ""
664
- if voice_enabled and elevenlabs_client:
665
- status_log.append("🔊 **Step 3 – Generating voice-over explanation (OpenAI + ElevenLabs)**")
666
- commentary, audio_b64 = generate_voice_commentary(
667
- quote_text=quote,
668
- niche=niche,
669
- persona=persona,
670
- trend_label=trend_label,
671
- )
672
- if audio_b64:
673
- status_log.append(" ✅ Voice-over created and encoded as base64")
674
- else:
675
- status_log.append(" ⚠️ Voice generation failed; continuing without audio")
676
- if commentary:
677
- status_log.append(f" 📝 Commentary preview: {commentary[:120]}...\n")
678
  else:
679
- status_log.append("🔇 **Step 3 – Voice-over skipped** (disabled or missing ElevenLabs)\n")
 
 
680
 
681
  # STEP 4: Search Pexels videos
682
  status_log.append("🎥 **Step 4 – Searching Pexels for background videos**")
@@ -731,7 +756,7 @@ def mcp_agent_pipeline(
731
  created_videos.append(out_path)
732
  status_log.append(f" ✅ Variation {i+1} rendered successfully")
733
 
734
- # Copy to gallery
735
  gallery_filename = f"gallery_{timestamp}_v{i+1}.mp4"
736
  gallery_path = os.path.join(gallery_dir, gallery_filename)
737
  try:
@@ -764,11 +789,14 @@ def mcp_agent_pipeline(
764
  return "\n".join(status_log), created_videos
765
 
766
 
767
- # ==== GALLERY UTIL ============================================================
768
 
769
 
770
  def load_gallery_videos() -> List[str]:
771
- """Load up to 6 most recent videos from persistent gallery folder."""
 
 
 
772
  gallery_output_dir = "/data/gallery_videos"
773
  os.makedirs(gallery_output_dir, exist_ok=True)
774
 
@@ -778,14 +806,9 @@ def load_gallery_videos() -> List[str]:
778
  glob.glob(f"{gallery_output_dir}/*.mp4"),
779
  key=os.path.getmtime,
780
  reverse=True,
781
- )[:6]
782
-
783
- videos: List[str] = [None] * 6 # type: ignore
784
- for i, video_path in enumerate(existing_videos):
785
- if i < 6:
786
- videos[i] = video_path
787
 
788
- return videos
789
 
790
 
791
  # ==== GRADIO UI ===============================================================
@@ -799,28 +822,21 @@ with gr.Blocks(
799
  # 🎬 AIQuoteClipGenerator
800
  ### MCP-flavored agent • Gemini + OpenAI + ElevenLabs + Modal
801
 
802
- **What it does:**
803
- - 🧠 Generates **non-repetitive quotes** using Gemini with variety tracking
804
- - 📈 Uses **trend-aware context** per niche (mini-RAG style)
805
- - 🎭 Applies a **persona** to shape the tone (coach / philosopher / poet / mentor)
806
- - 🔊 Optional **ElevenLabs voice-over** explaining the quote
807
- - 🎥 Pulls vertical stock videos from **Pexels**
808
- - ⚡ Renders final clips via **Modal**, 1–3 variations
809
  """
810
  )
811
 
812
- with gr.Accordion("📸 Example Gallery – Recent Videos", open=True):
813
- gr.Markdown("Auto-updated with your latest generated clips.")
814
-
815
- with gr.Row():
816
- gallery_video1 = gr.Video(height=260, show_label=False)
817
- gallery_video2 = gr.Video(height=260, show_label=False)
818
- gallery_video3 = gr.Video(height=260, show_label=False)
819
-
820
- with gr.Row():
821
- gallery_video4 = gr.Video(height=260, show_label=False)
822
- gallery_video5 = gr.Video(height=260, show_label=False)
823
- gallery_video6 = gr.Video(height=260, show_label=False)
824
 
825
  gr.Markdown("---")
826
  gr.Markdown("## 🎯 Generate Your Own Quote Video")
@@ -871,9 +887,13 @@ with gr.Blocks(
871
  value="classic_center",
872
  )
873
 
874
- voice_enabled = gr.Checkbox(
875
- label="🔊 Add voice-over explanation (ElevenLabs)",
876
- value=True,
 
 
 
 
877
  )
878
 
879
  num_variations = gr.Slider(
@@ -897,7 +917,7 @@ with gr.Blocks(
897
  show_label=False,
898
  )
899
 
900
- gr.Markdown("### ✨ Your Quote Videos")
901
  with gr.Row():
902
  video1 = gr.Video(label="Video 1", height=480)
903
  video2 = gr.Video(label="Video 2", height=480)
@@ -907,12 +927,10 @@ with gr.Blocks(
907
  """
908
  ---
909
  ### 🧩 Under the hood
910
- - **Context engineering:** niche + persona + trend theme
911
- - **Mini-RAG:** recent trends per niche feeding into generation prompts
912
- - **Hybrid LLM:** Gemini (quotes) + OpenAI (commentary)
913
- - **Multimodal pipeline:** text audio video composition
914
-
915
- Built for the MCP 1st Birthday Hackathon – Track 2 (MCP in Action, Productivity).
916
  """
917
  )
918
 
@@ -921,16 +939,16 @@ with gr.Blocks(
921
  style_val,
922
  persona_val,
923
  text_style_val,
 
924
  num_variations_val,
925
- voice_enabled_val,
926
  ):
927
  status, videos = mcp_agent_pipeline(
928
  niche=niche_val,
929
  style=style_val,
930
  persona=persona_val,
931
  text_style=text_style_val,
 
932
  num_variations=int(num_variations_val),
933
- voice_enabled=bool(voice_enabled_val),
934
  )
935
 
936
  v1 = videos[0] if len(videos) > 0 else None
@@ -944,12 +962,7 @@ with gr.Blocks(
944
  v1,
945
  v2,
946
  v3,
947
- gallery_vids[0],
948
- gallery_vids[1],
949
- gallery_vids[2],
950
- gallery_vids[3],
951
- gallery_vids[4],
952
- gallery_vids[5],
953
  ]
954
 
955
  generate_btn.click(
@@ -959,35 +972,23 @@ with gr.Blocks(
959
  style,
960
  persona,
961
  text_style,
 
962
  num_variations,
963
- voice_enabled,
964
  ],
965
  outputs=[
966
  output,
967
  video1,
968
  video2,
969
  video3,
970
- gallery_video1,
971
- gallery_video2,
972
- gallery_video3,
973
- gallery_video4,
974
- gallery_video5,
975
- gallery_video6,
976
  ],
977
  )
978
 
979
  # Load gallery on page load
980
  demo.load(
981
  load_gallery_videos,
982
- outputs=[
983
- gallery_video1,
984
- gallery_video2,
985
- gallery_video3,
986
- gallery_video4,
987
- gallery_video5,
988
- gallery_video6,
989
- ],
990
  )
991
 
992
  if __name__ == "__main__":
993
- demo.launch(allowed_paths=["/data/gallery_videos"])
 
204
  },
205
  }
206
 
 
207
  default = {
208
  "label": "modern glow-up & gentle discipline",
209
  "summary": (
 
519
  # ==== VOICE GENERATION (OpenAI explanation + ElevenLabs TTS) ==================
520
 
521
 
522
+ def get_voice_config(voice_profile: str) -> Tuple[str, VoiceSettings]:
523
+ """
524
+ Map a human-readable voice profile to an ElevenLabs voice_id + settings.
525
+ """
526
+ vp = (voice_profile or "").lower()
527
+
528
+ # Calm female (Rachel)
529
+ if "rachel" in vp or "female" in vp:
530
+ return (
531
+ "21m00Tcm4TlvDq8ikWAM", # Rachel (from ElevenLabs docs)
532
+ VoiceSettings(
533
+ stability=0.5,
534
+ similarity_boost=0.9,
535
+ style=0.4,
536
+ use_speaker_boost=True,
537
+ ),
538
+ )
539
+
540
+ # Warm male (Adam)
541
+ return (
542
+ "pNInz6obpgDQGcFmaJgB", # Adam
543
+ VoiceSettings(
544
+ stability=0.6,
545
+ similarity_boost=0.8,
546
+ style=0.5,
547
+ use_speaker_boost=True,
548
+ ),
549
+ )
550
+
551
+
552
  def generate_voice_commentary(
553
  quote_text: str,
554
  niche: str,
555
  persona: str,
556
  trend_label: str,
557
+ voice_profile: str,
558
  ) -> Tuple[str, str]:
559
  """
560
  Generate a short explanatory commentary + ElevenLabs audio (as base64).
561
+ Voice is always generated if ElevenLabs is available.
562
 
563
  Returns:
564
  (commentary_text, audio_b64) – audio_b64 may be "" if error.
 
608
 
609
  # 2) ElevenLabs TTS
610
  try:
611
+ voice_id, voice_settings = get_voice_config(voice_profile)
612
+
613
  audio_stream = elevenlabs_client.text_to_speech.convert(
614
  text=commentary,
615
+ voice_id=voice_id,
616
  model_id="eleven_multilingual_v2",
617
+ voice_settings=voice_settings,
 
 
 
 
 
618
  )
619
 
620
  audio_bytes = b"".join(chunk for chunk in audio_stream)
 
633
  style: str,
634
  persona: str,
635
  text_style: str,
636
+ voice_profile: str,
637
  num_variations: int = 1,
 
638
  ) -> Tuple[str, List[str]]:
639
  """
640
  MCP-flavored autonomous pipeline with:
641
  - Context engineering (persona, trends)
642
  - Trend-informed 'RAG' context injection
643
  - Quote generation via hybrid Gemini/OpenAI
644
+ - ElevenLabs narration (always on if available)
645
  - Modal-based video creation (1–3 variations)
646
  """
647
 
 
658
  status_log.append(f" • Visual style: `{style}`")
659
  status_log.append(f" • Persona: `{persona}`")
660
  status_log.append(f" • Text layout: `{text_style}`")
661
+ status_log.append(f" • Voice profile: `{voice_profile}`\n")
662
 
663
  trend_info = get_trend_insights(niche)
664
  trend_label = trend_info.get("label", "")
 
687
  preview = quote if len(quote) <= 140 else quote[:140] + "..."
688
  status_log.append(f" ✅ Quote: “{preview}”\n")
689
 
690
+ # STEP 3: Voice commentary (always attempted)
691
+ status_log.append("🔊 **Step 3 – Generating voice-over explanation (OpenAI + ElevenLabs)**")
692
+ commentary, audio_b64 = generate_voice_commentary(
693
+ quote_text=quote,
694
+ niche=niche,
695
+ persona=persona,
696
+ trend_label=trend_label,
697
+ voice_profile=voice_profile,
698
+ )
699
+ if audio_b64:
700
+ status_log.append(" ✅ Voice-over created and encoded as base64")
 
 
 
 
 
701
  else:
702
+ status_log.append(" ⚠️ Voice generation failed or ElevenLabs unavailable")
703
+ if commentary:
704
+ status_log.append(f" 📝 Commentary preview: {commentary[:120]}...\n")
705
 
706
  # STEP 4: Search Pexels videos
707
  status_log.append("🎥 **Step 4 – Searching Pexels for background videos**")
 
756
  created_videos.append(out_path)
757
  status_log.append(f" ✅ Variation {i+1} rendered successfully")
758
 
759
+ # Copy to gallery (we keep ALL; scrolling handled by Gradio gallery)
760
  gallery_filename = f"gallery_{timestamp}_v{i+1}.mp4"
761
  gallery_path = os.path.join(gallery_dir, gallery_filename)
762
  try:
 
789
  return "\n".join(status_log), created_videos
790
 
791
 
792
+ # ==== GALLERY UTIL (SCROLLABLE, KEEPS ALL) ====================================
793
 
794
 
795
  def load_gallery_videos() -> List[str]:
796
+ """
797
+ Load all videos from persistent gallery folder (sorted newest → oldest).
798
+ Gradio's Gallery will handle scrolling.
799
+ """
800
  gallery_output_dir = "/data/gallery_videos"
801
  os.makedirs(gallery_output_dir, exist_ok=True)
802
 
 
806
  glob.glob(f"{gallery_output_dir}/*.mp4"),
807
  key=os.path.getmtime,
808
  reverse=True,
809
+ )
 
 
 
 
 
810
 
811
+ return existing_videos
812
 
813
 
814
  # ==== GRADIO UI ===============================================================
 
822
  # 🎬 AIQuoteClipGenerator
823
  ### MCP-flavored agent • Gemini + OpenAI + ElevenLabs + Modal
824
 
825
+ An autonomous mini-studio that generates trend-aware quote videos with voice-over,
826
+ cinematic stock footage, and MCP-style agent reasoning.
 
 
 
 
 
827
  """
828
  )
829
 
830
+ with gr.Accordion("📸 Example Gallery – All Generated Videos", open=True):
831
+ gr.Markdown("Scroll to explore all the clips you've generated so far.")
832
+ gallery = gr.Gallery(
833
+ label=None,
834
+ elem_id="gallery",
835
+ show_label=False,
836
+ columns=3,
837
+ height=540,
838
+ preview=True,
839
+ )
 
 
840
 
841
  gr.Markdown("---")
842
  gr.Markdown("## 🎯 Generate Your Own Quote Video")
 
887
  value="classic_center",
888
  )
889
 
890
+ voice_profile = gr.Dropdown(
891
+ choices=[
892
+ "Calm Female (Rachel)",
893
+ "Warm Male (Adam)",
894
+ ],
895
+ label="🔊 Voice Profile (ElevenLabs)",
896
+ value="Calm Female (Rachel)",
897
  )
898
 
899
  num_variations = gr.Slider(
 
917
  show_label=False,
918
  )
919
 
920
+ gr.Markdown("### ✨ Your Quote Videos (This Run)")
921
  with gr.Row():
922
  video1 = gr.Video(label="Video 1", height=480)
923
  video2 = gr.Video(label="Video 2", height=480)
 
927
  """
928
  ---
929
  ### 🧩 Under the hood
930
+ - Context engineering: niche + persona + trend theme
931
+ - Mini-RAG: curated trend knowledge feeding into generation
932
+ - Hybrid LLM: Gemini (quotes) + OpenAI (commentary)
933
+ - Multimodal pipeline: text audio video
 
 
934
  """
935
  )
936
 
 
939
  style_val,
940
  persona_val,
941
  text_style_val,
942
+ voice_profile_val,
943
  num_variations_val,
 
944
  ):
945
  status, videos = mcp_agent_pipeline(
946
  niche=niche_val,
947
  style=style_val,
948
  persona=persona_val,
949
  text_style=text_style_val,
950
+ voice_profile=voice_profile_val,
951
  num_variations=int(num_variations_val),
 
952
  )
953
 
954
  v1 = videos[0] if len(videos) > 0 else None
 
962
  v1,
963
  v2,
964
  v3,
965
+ gallery_vids,
 
 
 
 
 
966
  ]
967
 
968
  generate_btn.click(
 
972
  style,
973
  persona,
974
  text_style,
975
+ voice_profile,
976
  num_variations,
 
977
  ],
978
  outputs=[
979
  output,
980
  video1,
981
  video2,
982
  video3,
983
+ gallery,
 
 
 
 
 
984
  ],
985
  )
986
 
987
  # Load gallery on page load
988
  demo.load(
989
  load_gallery_videos,
990
+ outputs=[gallery],
 
 
 
 
 
 
 
991
  )
992
 
993
  if __name__ == "__main__":
994
+ demo.launch(allowed_paths=["/data/gallery_videos"])