Upload app.py
Browse files
app.py
CHANGED
|
@@ -751,7 +751,7 @@ Rewrite the prompt to MAXIMIZE accuracy on sentiment classification.
|
|
| 751 |
CRITICAL REQUIREMENTS (these DIRECTLY affect score):
|
| 752 |
1. β MUST include word "sentiment" β model response will contain "sentiment" keyword
|
| 753 |
2. β MUST use pattern "[Action] sentiment: {{input}}" β triggers correct response format
|
| 754 |
-
3. β
|
| 755 |
4. β MUST keep {{input}} placeholder EXACTLY as-is
|
| 756 |
|
| 757 |
PROVEN WORKING PATTERNS (use these!):
|
|
@@ -764,7 +764,8 @@ PATTERNS THAT FAIL (avoid!):
|
|
| 764 |
- β "Review: {{input}}" - missing "sentiment" keyword
|
| 765 |
- β "Please analyze the sentiment..." - too long, word "please"
|
| 766 |
|
| 767 |
-
Generate a
|
|
|
|
| 768 |
|
| 769 |
Output ONLY the new prompt between ```text markers:
|
| 770 |
|
|
@@ -782,7 +783,7 @@ Your improved prompt here
|
|
| 782 |
"api_base": "https://openrouter.ai/api/v1", # Use OpenRouter endpoint
|
| 783 |
"temperature": 1.2, # Even higher temperature for more creative variations
|
| 784 |
},
|
| 785 |
-
"max_iterations":
|
| 786 |
"checkpoint_interval": 1, # Save checkpoints every iteration to preserve prompt history
|
| 787 |
"diff_based_evolution": False, # Use full rewrite mode for prompts (not diff/patch mode)
|
| 788 |
"language": "text", # CRITICAL: Optimize text/prompts, not Python code!
|
|
@@ -1011,7 +1012,7 @@ def optimize_prompt(initial_prompt: str, dataset_name: str, dataset_split: str,
|
|
| 1011 |
- **Initial Eval**: 50 samples
|
| 1012 |
- **Final Eval**: 50 samples (same samples for fair comparison)
|
| 1013 |
- **Evolution**: 50 samples per variant (SAME samples as initial/final!)
|
| 1014 |
-
- **Iterations**:
|
| 1015 |
|
| 1016 |
### Results
|
| 1017 |
- **Initial Accuracy**: {initial_eval['accuracy']:.2f}% ({initial_eval['correct']}/{initial_eval['total']})
|
|
|
|
| 751 |
CRITICAL REQUIREMENTS (these DIRECTLY affect score):
|
| 752 |
1. β MUST include word "sentiment" β model response will contain "sentiment" keyword
|
| 753 |
2. β MUST use pattern "[Action] sentiment: {{input}}" β triggers correct response format
|
| 754 |
+
3. β Keep it reasonable (under 1000 chars) β focus on clarity and effectiveness
|
| 755 |
4. β MUST keep {{input}} placeholder EXACTLY as-is
|
| 756 |
|
| 757 |
PROVEN WORKING PATTERNS (use these!):
|
|
|
|
| 764 |
- β "Review: {{input}}" - missing "sentiment" keyword
|
| 765 |
- β "Please analyze the sentiment..." - too long, word "please"
|
| 766 |
|
| 767 |
+
Generate a DIRECT, EFFECTIVE prompt using the working pattern above.
|
| 768 |
+
You have up to 1000 characters to craft the best possible prompt.
|
| 769 |
|
| 770 |
Output ONLY the new prompt between ```text markers:
|
| 771 |
|
|
|
|
| 783 |
"api_base": "https://openrouter.ai/api/v1", # Use OpenRouter endpoint
|
| 784 |
"temperature": 1.2, # Even higher temperature for more creative variations
|
| 785 |
},
|
| 786 |
+
"max_iterations": 10, # More iterations for better convergence
|
| 787 |
"checkpoint_interval": 1, # Save checkpoints every iteration to preserve prompt history
|
| 788 |
"diff_based_evolution": False, # Use full rewrite mode for prompts (not diff/patch mode)
|
| 789 |
"language": "text", # CRITICAL: Optimize text/prompts, not Python code!
|
|
|
|
| 1012 |
- **Initial Eval**: 50 samples
|
| 1013 |
- **Final Eval**: 50 samples (same samples for fair comparison)
|
| 1014 |
- **Evolution**: 50 samples per variant (SAME samples as initial/final!)
|
| 1015 |
+
- **Iterations**: 10 (population: 15, elite: 40%, explore: 10%, exploit: 50%)
|
| 1016 |
|
| 1017 |
### Results
|
| 1018 |
- **Initial Accuracy**: {initial_eval['accuracy']:.2f}% ({initial_eval['correct']}/{initial_eval['total']})
|