Integrate Novita AI as exclusive inference provider - Add Novita AI API integration with DeepSeek-R1-Distill-Qwen-7B model - Remove all local model dependencies - Optimize token allocation for user inputs and context - Add Anaconda environment setup files - Add comprehensive test scripts and documentation
927854c
| # Novita AI Implementation Summary | |
| ## β Implementation Complete | |
| All changes have been implemented to switch from local models to Novita AI API as the only inference source. | |
| ## π Files Modified | |
| ### 1. β `src/config.py` | |
| - Added Novita AI configuration section with: | |
| - `novita_api_key` (required, validated) | |
| - `novita_base_url` (default: https://api.novita.ai/dedicated/v1/openai) | |
| - `novita_model` (default: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2) | |
| - `deepseek_r1_temperature` (default: 0.6, validated 0.5-0.7 range) | |
| - `deepseek_r1_force_reasoning` (default: True) | |
| - Token allocation configuration: | |
| - `user_input_max_tokens` (default: 8000) | |
| - `context_preparation_budget` (default: 28000) | |
| - `context_pruning_threshold` (default: 28000) | |
| - `prioritize_user_input` (default: True) | |
| ### 2. β `requirements.txt` | |
| - Added `openai>=1.0.0` package | |
| ### 3. β `src/models_config.py` | |
| - Changed `primary_provider` from "local" to "novita_api" | |
| - Updated all model IDs to Novita model ID | |
| - Added DeepSeek-R1 optimized parameters: | |
| - Temperature: 0.6 for reasoning, 0.5 for classification/safety | |
| - Top_p: 0.95 for reasoning, 0.9 for classification | |
| - `force_reasoning_prefix: True` for reasoning tasks | |
| - Removed all local model configuration (quantization, fallbacks) | |
| ### 4. β `src/llm_router.py` (Complete Rewrite) | |
| - Removed all local model loading code | |
| - Removed `LocalModelLoader` dependencies | |
| - Added OpenAI client initialization | |
| - Implemented `_call_novita_api()` method | |
| - Added DeepSeek-R1 optimizations: | |
| - `_format_deepseek_r1_prompt()` - reasoning trigger and math directives | |
| - `_is_math_query()` - automatic math detection | |
| - `_clean_reasoning_tags()` - response cleanup | |
| - Updated `prepare_context_for_llm()` with: | |
| - User input priority (never truncated) | |
| - Dedicated 8K token budget for user input | |
| - 28K token context preparation budget | |
| - Dynamic context allocation | |
| - Updated `health_check()` for Novita API | |
| - Removed all local model methods | |
| ### 5. β `flask_api_standalone.py` | |
| - Updated `initialize_orchestrator()`: | |
| - Changed to "Novita AI API Only" mode | |
| - Removed HF_TOKEN dependency | |
| - Set `use_local_models=False` | |
| - Updated error handling for configuration errors | |
| - Increased `MAX_MESSAGE_LENGTH` from 10KB to 100KB | |
| - Updated logging messages | |
| ### 6. β `src/context_manager.py` | |
| - Updated `prune_context()` to use config threshold (28000 tokens) | |
| - Increased user input storage from 500 to 5000 characters | |
| - Increased system response storage from 1000 to 2000 characters | |
| - Updated interaction context generation to use more of user input | |
| ## π Environment Variables Required | |
| Create a `.env` file with the following (see `.env.example` for full template): | |
| ```bash | |
| # REQUIRED - Novita AI Configuration | |
| NOVITA_API_KEY=your_api_key_here | |
| NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai | |
| NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2 | |
| # DeepSeek-R1 Optimized Settings | |
| DEEPSEEK_R1_TEMPERATURE=0.6 | |
| DEEPSEEK_R1_FORCE_REASONING=True | |
| # Token Allocation (Optional - defaults provided) | |
| USER_INPUT_MAX_TOKENS=8000 | |
| CONTEXT_PREPARATION_BUDGET=28000 | |
| CONTEXT_PRUNING_THRESHOLD=28000 | |
| PRIORITIZE_USER_INPUT=True | |
| ``` | |
| ## π Installation Steps | |
| 1. **Install dependencies:** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 2. **Create `.env` file:** | |
| ```bash | |
| cp .env.example .env | |
| # Edit .env and add your NOVITA_API_KEY | |
| ``` | |
| 3. **Set environment variables:** | |
| ```bash | |
| export NOVITA_API_KEY=your_api_key_here | |
| export NOVITA_BASE_URL=https://api.novita.ai/dedicated/v1/openai | |
| export NOVITA_MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2 | |
| ``` | |
| 4. **Start the application:** | |
| ```bash | |
| python flask_api_standalone.py | |
| ``` | |
| ## β¨ Key Features Implemented | |
| ### DeepSeek-R1 Optimizations | |
| - β Temperature set to 0.6 (recommended range 0.5-0.7) | |
| - β Reasoning trigger (`<think>` prefix) for reasoning tasks | |
| - β Automatic math directive detection | |
| - β No system prompts (all instructions in user prompt) | |
| ### Token Allocation | |
| - β User input: 8K tokens dedicated budget (never truncated) | |
| - β Context preparation: 28K tokens total budget | |
| - β Context pruning: 28K token threshold | |
| - β User input always prioritized over historical context | |
| ### API Improvements | |
| - β Message length limit: 100KB (increased from 10KB) | |
| - β Better error messages with token estimates | |
| - β Configuration validation with helpful error messages | |
| ### Database Storage | |
| - β User input storage: 5000 characters (increased from 500) | |
| - β System response storage: 2000 characters (increased from 1000) | |
| ## π§ͺ Testing Checklist | |
| - [ ] Test API health check endpoint | |
| - [ ] Test simple inference request | |
| - [ ] Test large user input (5K+ tokens) | |
| - [ ] Test reasoning tasks (should see reasoning trigger) | |
| - [ ] Test math queries (should see math directive) | |
| - [ ] Test context preparation (user input should not be truncated) | |
| - [ ] Test error handling (missing API key, invalid endpoint) | |
| ## π Expected Behavior | |
| 1. **Startup:** | |
| - System initializes Novita AI client | |
| - Validates API key is present | |
| - Logs Novita AI configuration | |
| 2. **Inference:** | |
| - All requests routed to Novita AI API | |
| - DeepSeek-R1 optimizations applied automatically | |
| - User input prioritized in context preparation | |
| 3. **Error Handling:** | |
| - Clear error messages if API key missing | |
| - Helpful guidance for configuration issues | |
| - Graceful handling of API failures | |
| ## π§ Troubleshooting | |
| ### Issue: "NOVITA_API_KEY is required" | |
| **Solution:** Set the environment variable: | |
| ```bash | |
| export NOVITA_API_KEY=your_key_here | |
| ``` | |
| ### Issue: "openai package not available" | |
| **Solution:** Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### Issue: API connection errors | |
| **Solution:** | |
| - Verify API key is correct | |
| - Check base URL matches your endpoint | |
| - Verify model ID matches your deployment | |
| ## π Configuration Reference | |
| ### Model Configuration | |
| - **Model ID:** `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B:de-1a706eeafbf3ebc2` | |
| - **Context Window:** 131,072 tokens (131K) | |
| - **Optimized Settings:** Temperature 0.6, Top_p 0.95 | |
| ### Token Allocation | |
| - **User Input:** 8,000 tokens (dedicated, never truncated) | |
| - **Context Budget:** 28,000 tokens (includes user input + context) | |
| - **Output Limits:** | |
| - Reasoning: 4,096 tokens | |
| - Synthesis: 2,000 tokens | |
| - Classification: 512 tokens | |
| ## π― Next Steps | |
| 1. Set your `NOVITA_API_KEY` in environment variables | |
| 2. Test the health check endpoint: `GET /api/health` | |
| 3. Send a test request: `POST /api/chat` | |
| 4. Monitor logs for Novita AI API calls | |
| 5. Verify DeepSeek-R1 optimizations are working | |
| ## π Notes | |
| - All local model code has been removed | |
| - System now depends entirely on Novita AI API | |
| - No GPU/quantization configuration needed | |
| - No model downloading required | |
| - Faster startup (no model loading) | |