dify

2025-12-01 17:21:38 +08:00
parent 32fee2b8ab
commit fab8c13cb3
7511 changed files with 996300 additions and 0 deletions
--- a/dify/scripts/stress-test/README.md
+++ b/dify/scripts/stress-test/README.md
@@ -0,0 +1,521 @@
+# Dify Stress Test Suite
+
+A high-performance stress test suite for Dify workflow execution using **Locust** - optimized for measuring Server-Sent Events (SSE) streaming performance.
+
+## Key Metrics Tracked
+
+The stress test focuses on four critical SSE performance indicators:
+
+1. **Active SSE Connections** - Real-time count of open SSE connections
+1. **New Connection Rate** - Connections per second (conn/sec)
+1. **Time to First Event (TTFE)** - Latency until first SSE event arrives
+1. **Event Throughput** - Events per second (events/sec)
+
+## Features
+
+- **True SSE Support**: Properly handles Server-Sent Events streaming without premature connection closure
+- **Real-time Metrics**: Live reporting every 5 seconds during tests
+- **Comprehensive Tracking**:
+  - Active connection monitoring
+  - Connection establishment rate
+  - Event processing throughput
+  - TTFE distribution analysis
+- **Multiple Interfaces**:
+  - Web UI for real-time monitoring (<http://localhost:8089>)
+  - Headless mode with periodic console updates
+- **Detailed Reports**: Final statistics with overall rates and averages
+- **Easy Configuration**: Uses existing API key configuration from setup
+
+## What Gets Measured
+
+The stress test focuses on SSE streaming performance with these key metrics:
+
+### Primary Endpoint: `/v1/workflows/run`
+
+The stress test tests a single endpoint with comprehensive SSE metrics tracking:
+
+- **Request Type**: POST request to workflow execution API
+- **Response Type**: Server-Sent Events (SSE) stream
+- **Payload**: Random questions from a configurable pool
+- **Concurrency**: Configurable from 1 to 1000+ simultaneous users
+
+### Key Performance Metrics
+
+#### 1. **Active Connections**
+
+- **What it measures**: Number of concurrent SSE connections open at any moment
+- **Why it matters**: Shows system's ability to handle parallel streams
+- **Good values**: Should remain stable under load without drops
+
+#### 2. **Connection Rate (conn/sec)**
+
+- **What it measures**: How fast new SSE connections are established
+- **Why it matters**: Indicates system's ability to handle connection spikes
+- **Good values**:
+  - Light load: 5-10 conn/sec
+  - Medium load: 20-50 conn/sec
+  - Heavy load: 100+ conn/sec
+
+#### 3. **Time to First Event (TTFE)**
+
+- **What it measures**: Latency from request sent to first SSE event received
+- **Why it matters**: Critical for user experience - faster TTFE = better perceived performance
+- **Good values**:
+  - Excellent: < 50ms
+  - Good: 50-100ms
+  - Acceptable: 100-500ms
+  - Poor: > 500ms
+
+#### 4. **Event Throughput (events/sec)**
+
+- **What it measures**: Rate of SSE events being delivered across all connections
+- **Why it matters**: Shows actual data delivery performance
+- **Expected values**: Depends on workflow complexity and number of connections
+  - Single connection: 10-20 events/sec
+  - 10 connections: 50-100 events/sec
+  - 100 connections: 200-500 events/sec
+
+#### 5. **Request/Response Times**
+
+- **P50 (Median)**: 50% of requests complete within this time
+- **P95**: 95% of requests complete within this time
+- **P99**: 99% of requests complete within this time
+- **Min/Max**: Best and worst case response times
+
+## Prerequisites
+
+1. **Dependencies are automatically installed** when running setup:
+
+   - Locust (load testing framework)
+   - sseclient-py (SSE client library)
+
+1. **Complete Dify setup**:
+
+   ```bash
+   # Run the complete setup
+   python scripts/stress-test/setup_all.py
+   ```
+
+1. **Ensure services are running**:
+
+   **IMPORTANT**: For accurate stress testing, run the API server with Gunicorn in production mode:
+
+   ```bash
+   # Run from the api directory
+   cd api
+   uv run gunicorn \
+     --bind 0.0.0.0:5001 \
+     --workers 4 \
+     --worker-class gevent \
+     --timeout 120 \
+     --keep-alive 5 \
+     --log-level info \
+     --access-logfile - \
+     --error-logfile - \
+     app:app
+   ```
+
+   **Configuration options explained**:
+
+   - `--workers 4`: Number of worker processes (adjust based on CPU cores)
+   - `--worker-class gevent`: Async worker for handling concurrent connections
+   - `--timeout 120`: Worker timeout for long-running requests
+   - `--keep-alive 5`: Keep connections alive for SSE streaming
+
+   **NOT RECOMMENDED for stress testing**:
+
+   ```bash
+   # Debug mode - DO NOT use for stress testing (slow performance)
+   ./dev/start-api  # This runs Flask in debug mode with single-threaded execution
+   ```
+
+   **Also start the Mock OpenAI server**:
+
+   ```bash
+   python scripts/stress-test/setup/mock_openai_server.py
+   ```
+
+## Running the Stress Test
+
+```bash
+# Run with default configuration (headless mode)
+./scripts/stress-test/run_locust_stress_test.sh
+
+# Or run directly with uv
+uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001
+
+# Run with Web UI (access at http://localhost:8089)
+uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001 --web-port 8089
+```
+
+The script will:
+
+1. Validate that all required services are running
+1. Check API token availability
+1. Execute the Locust stress test with SSE support
+1. Generate comprehensive reports in the `reports/` directory
+
+## Configuration
+
+The stress test configuration is in `locust.conf`:
+
+```ini
+users = 10           # Number of concurrent users
+spawn-rate = 2       # Users spawned per second
+run-time = 1m        # Test duration (30s, 5m, 1h)
+headless = true      # Run without web UI
+```
+
+### Custom Question Sets
+
+Modify the questions list in `sse_benchmark.py`:
+
+```python
+self.questions = [
+    "Your custom question 1",
+    "Your custom question 2",
+    # Add more questions...
+]
+```
+
+## Understanding the Results
+
+### Report Structure
+
+After running the stress test, you'll find these files in the `reports/` directory:
+
+- `locust_summary_YYYYMMDD_HHMMSS.txt` - Complete console output with metrics
+- `locust_report_YYYYMMDD_HHMMSS.html` - Interactive HTML report with charts
+- `locust_YYYYMMDD_HHMMSS_stats.csv` - CSV with detailed statistics
+- `locust_YYYYMMDD_HHMMSS_stats_history.csv` - Time-series data
+
+### Key Metrics
+
+**Requests Per Second (RPS)**:
+
+- **Excellent**: > 50 RPS
+- **Good**: 20-50 RPS
+- **Acceptable**: 10-20 RPS
+- **Needs Improvement**: < 10 RPS
+
+**Response Time Percentiles**:
+
+- **P50 (Median)**: 50% of requests complete within this time
+- **P95**: 95% of requests complete within this time
+- **P99**: 99% of requests complete within this time
+
+**Success Rate**:
+
+- Should be > 99% for production readiness
+- Lower rates indicate errors or timeouts
+
+### Example Output
+
+```text
+============================================================
+DIFY SSE STRESS TEST
+============================================================
+
+[2025-09-12 15:45:44,468] Starting test run with 10 users at 2 users/sec
+
+============================================================
+SSE Metrics | Active:   8 | Total Conn:   142 | Events:   2841
+Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
+============================================================
+
+Type     Name                          # reqs  # fails |    Avg     Min     Max    Med | req/s  failures/s
+---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
+POST     /v1/workflows/run                  142   0(0.00%) |     41      18     192     38 |   2.37        0.00
+---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
+         Aggregated                         142   0(0.00%) |     41      18     192     38 |   2.37        0.00
+
+============================================================
+FINAL RESULTS
+============================================================
+Total Connections: 142
+Total Events:      2841
+Average TTFE:      43 ms
+============================================================
+```
+
+### How to Read the Results
+
+**Live SSE Metrics Box (Updates every 10 seconds):**
+
+```text
+SSE Metrics | Active:   8 | Total Conn:   142 | Events:   2841
+Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
+```
+
+- **Active**: Current number of open SSE connections
+- **Total Conn**: Cumulative connections established
+- **Events**: Total SSE events received
+- **conn/s**: Connection establishment rate
+- **events/s**: Event delivery rate
+- **TTFE**: Average time to first event
+
+**Standard Locust Table:**
+
+```text
+Type     Name                # reqs  # fails |    Avg     Min     Max    Med | req/s
+POST     /v1/workflows/run      142   0(0.00%) |     41      18     192     38 |   2.37
+```
+
+- **Type**: Always POST for our SSE requests
+- **Name**: The API endpoint being tested
+- **# reqs**: Total requests made
+- **# fails**: Failed requests (should be 0)
+- **Avg/Min/Max/Med**: Response time percentiles (ms)
+- **req/s**: Request throughput
+
+**Performance Targets:**
+
+✅ **Good Performance**:
+
+- Zero failures (0.00%)
+- TTFE < 100ms
+- Stable active connections
+- Consistent event throughput
+
+⚠️ **Warning Signs**:
+
+- Failures > 1%
+- TTFE > 500ms
+- Dropping active connections
+- Declining event rate over time
+
+## Test Scenarios
+
+### Light Load
+
+```yaml
+concurrency: 10
+iterations: 100
+```
+
+### Normal Load
+
+```yaml
+concurrency: 100
+iterations: 1000
+```
+
+### Heavy Load
+
+```yaml
+concurrency: 500
+iterations: 5000
+```
+
+### Stress Test
+
+```yaml
+concurrency: 1000
+iterations: 10000
+```
+
+## Performance Tuning
+
+### API Server Optimization
+
+**Gunicorn Tuning for Different Load Levels**:
+
+```bash
+# Light load (10-50 concurrent users)
+uv run gunicorn --bind 0.0.0.0:5001 --workers 2 --worker-class gevent app:app
+
+# Medium load (50-200 concurrent users)
+uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent --worker-connections 1000 app:app
+
+# Heavy load (200-1000 concurrent users)
+uv run gunicorn --bind 0.0.0.0:5001 --workers 8 --worker-class gevent --worker-connections 2000 --max-requests 1000 app:app
+```
+
+**Worker calculation formula**:
+
+- Workers = (2 × CPU cores) + 1
+- For SSE/WebSocket: Use gevent worker class
+- For CPU-bound tasks: Use sync workers
+
+### Database Optimization
+
+**PostgreSQL Connection Pool Tuning**:
+
+For high-concurrency stress testing, increase the PostgreSQL max connections in `docker/middleware.env`:
+
+```bash
+# Edit docker/middleware.env
+POSTGRES_MAX_CONNECTIONS=200  # Default is 100
+
+# Recommended values for different load levels:
+# Light load (10-50 users): 100 (default)
+# Medium load (50-200 users): 200
+# Heavy load (200-1000 users): 500
+```
+
+After changing, restart the PostgreSQL container:
+
+```bash
+docker compose -f docker/docker-compose.middleware.yaml down db
+docker compose -f docker/docker-compose.middleware.yaml up -d db
+```
+
+**Note**: Each connection uses ~10MB of RAM. Ensure your database server has sufficient memory:
+
+- 100 connections: ~1GB RAM
+- 200 connections: ~2GB RAM
+- 500 connections: ~5GB RAM
+
+### System Optimizations
+
+1. **Increase file descriptor limits**:
+
+   ```bash
+   ulimit -n 65536
+   ```
+
+1. **TCP tuning for high concurrency** (Linux):
+
+   ```bash
+   # Increase TCP buffer sizes
+   sudo sysctl -w net.core.rmem_max=134217728
+   sudo sysctl -w net.core.wmem_max=134217728
+
+   # Enable TCP fast open
+   sudo sysctl -w net.ipv4.tcp_fastopen=3
+   ```
+
+1. **macOS specific**:
+
+   ```bash
+   # Increase maximum connections
+   sudo sysctl -w kern.ipc.somaxconn=2048
+   ```
+
+## Troubleshooting
+
+### Common Issues
+
+1. **"ModuleNotFoundError: No module named 'locust'"**:
+
+   ```bash
+   # Dependencies are installed automatically, but if needed:
+   uv --project api add --dev locust sseclient-py
+   ```
+
+1. **"API key configuration not found"**:
+
+   ```bash
+   # Run setup
+   python scripts/stress-test/setup_all.py
+   ```
+
+1. **Services not running**:
+
+   ```bash
+   # Start Dify API with Gunicorn (production mode)
+   cd api
+   uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent app:app
+
+   # Start Mock OpenAI server
+   python scripts/stress-test/setup/mock_openai_server.py
+   ```
+
+1. **High error rate**:
+
+   - Reduce concurrency level
+   - Check system resources (CPU, memory)
+   - Review API server logs for errors
+   - Increase timeout values if needed
+
+1. **Permission denied running script**:
+
+   ```bash
+   chmod +x run_benchmark.sh
+   ```
+
+## Advanced Usage
+
+### Running Multiple Iterations
+
+```bash
+# Run stress test 3 times with 60-second intervals
+for i in {1..3}; do
+    echo "Run $i of 3"
+    ./run_locust_stress_test.sh
+    sleep 60
+done
+```
+
+### Custom Locust Options
+
+Run Locust directly with custom options:
+
+```bash
+# With specific user count and spawn rate
+uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
+  --host http://localhost:5001 --users 50 --spawn-rate 5
+
+# Generate CSV reports
+uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
+  --host http://localhost:5001 --csv reports/results
+
+# Run for specific duration
+uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
+  --host http://localhost:5001 --run-time 5m --headless
+```
+
+### Comparing Results
+
+```bash
+# Compare multiple stress test runs
+ls -la reports/stress_test_*.txt | tail -5
+```
+
+## Interpreting Performance Issues
+
+### High Response Times
+
+Possible causes:
+
+- Database query performance
+- External API latency
+- Insufficient server resources
+- Network congestion
+
+### Low Throughput (RPS < 10)
+
+Check for:
+
+- CPU bottlenecks
+- Memory constraints
+- Database connection pooling
+- API rate limiting
+
+### High Error Rate
+
+Investigate:
+
+- Server error logs
+- Resource exhaustion
+- Timeout configurations
+- Connection limits
+
+## Why Locust?
+
+Locust was chosen over Drill for this stress test because:
+
+1. **Proper SSE Support**: Correctly handles streaming responses without premature closure
+1. **Custom Metrics**: Can track SSE-specific metrics like TTFE and stream duration
+1. **Web UI**: Real-time monitoring and control via web interface
+1. **Python Integration**: Seamlessly integrates with existing Python setup code
+1. **Extensibility**: Easy to customize for specific testing scenarios
+
+## Contributing
+
+To improve the stress test suite:
+
+1. Edit `stress_test.yml` for configuration changes
+1. Modify `run_locust_stress_test.sh` for workflow improvements
+1. Update question sets for better coverage
+1. Add new metrics or analysis features
--- a/dify/scripts/stress-test/cleanup.py
+++ b/dify/scripts/stress-test/cleanup.py
@@ -0,0 +1,88 @@
+#!/usr/bin/env python3
+
+import shutil
+import sys
+from pathlib import Path
+
+from common import Logger
+
+
+def cleanup() -> None:
+    """Clean up all configuration files and reports created during setup and stress testing."""
+
+    log = Logger("Cleanup")
+    log.header("Stress Test Cleanup")
+
+    config_dir = Path(__file__).parent / "setup" / "config"
+    reports_dir = Path(__file__).parent / "reports"
+
+    dirs_to_clean = []
+    if config_dir.exists():
+        dirs_to_clean.append(config_dir)
+    if reports_dir.exists():
+        dirs_to_clean.append(reports_dir)
+
+    if not dirs_to_clean:
+        log.success("No directories to clean. Everything is already clean.")
+        return
+
+    log.info("Cleaning up stress test data...")
+    log.info("This will remove:")
+    for dir_path in dirs_to_clean:
+        log.list_item(str(dir_path))
+
+    # List files that will be deleted
+    log.separator()
+    if config_dir.exists():
+        config_files = list(config_dir.glob("*.json"))
+        if config_files:
+            log.info("Config files to be removed:")
+            for file in config_files:
+                log.list_item(file.name)
+
+    if reports_dir.exists():
+        report_files = list(reports_dir.glob("*"))
+        if report_files:
+            log.info("Report files to be removed:")
+            for file in report_files:
+                log.list_item(file.name)
+
+    # Ask for confirmation if running interactively
+    if sys.stdin.isatty():
+        log.separator()
+        log.warning("This action cannot be undone!")
+        confirmation = input("Are you sure you want to remove all config and report files? (yes/no): ")
+
+        if confirmation.lower() not in ["yes", "y"]:
+            log.error("Cleanup cancelled.")
+            return
+
+    try:
+        # Remove directories and all their contents
+        for dir_path in dirs_to_clean:
+            shutil.rmtree(dir_path)
+            log.success(f"{dir_path.name} directory removed successfully!")
+
+        log.separator()
+        log.info("To run the setup again, execute:")
+        log.list_item("python setup_all.py")
+        log.info("Or run scripts individually in this order:")
+        log.list_item("python setup/mock_openai_server.py (in a separate terminal)")
+        log.list_item("python setup/setup_admin.py")
+        log.list_item("python setup/login_admin.py")
+        log.list_item("python setup/install_openai_plugin.py")
+        log.list_item("python setup/configure_openai_plugin.py")
+        log.list_item("python setup/import_workflow_app.py")
+        log.list_item("python setup/create_api_key.py")
+        log.list_item("python setup/publish_workflow.py")
+        log.list_item("python setup/run_workflow.py")
+
+    except PermissionError as e:
+        log.error(f"Permission denied: {e}")
+        log.info("Try running with appropriate permissions.")
+    except Exception as e:
+        log.error(f"An error occurred during cleanup: {e}")
+
+
+if __name__ == "__main__":
+    cleanup()
--- a/dify/scripts/stress-test/common/init.py
+++ b/dify/scripts/stress-test/common/init.py
@@ -0,0 +1,6 @@
+"""Common utilities for Dify benchmark suite."""
+
+from .config_helper import config_helper
+from .logger_helper import Logger, ProgressLogger
+
+__all__ = ["Logger", "ProgressLogger", "config_helper"]
--- a/dify/scripts/stress-test/common/config_helper.py
+++ b/dify/scripts/stress-test/common/config_helper.py
@@ -0,0 +1,240 @@
+#!/usr/bin/env python3
+
+import json
+from pathlib import Path
+from typing import Any
+
+
+class ConfigHelper:
+    """Helper class for reading and writing configuration files."""
+
+    def __init__(self, base_dir: Path | None = None):
+        """Initialize ConfigHelper with base directory.
+
+        Args:
+            base_dir: Base directory for config files. If None, uses setup/config
+        """
+        if base_dir is None:
+            # Default to config directory in setup folder
+            base_dir = Path(__file__).parent.parent / "setup" / "config"
+        self.base_dir = base_dir
+        self.state_file = "stress_test_state.json"
+
+    def ensure_config_dir(self) -> None:
+        """Ensure the config directory exists."""
+        self.base_dir.mkdir(exist_ok=True, parents=True)
+
+    def get_config_path(self, filename: str) -> Path:
+        """Get the full path for a config file.
+
+        Args:
+            filename: Name of the config file (e.g., 'admin_config.json')
+
+        Returns:
+            Full path to the config file
+        """
+        if not filename.endswith(".json"):
+            filename += ".json"
+        return self.base_dir / filename
+
+    def read_config(self, filename: str) -> dict[str, Any] | None:
+        """Read a configuration file.
+
+        DEPRECATED: Use read_state() or get_state_section() for new code.
+        This method provides backward compatibility.
+
+        Args:
+            filename: Name of the config file to read
+
+        Returns:
+            Dictionary containing config data, or None if file doesn't exist
+        """
+        # Provide backward compatibility for old config names
+        if filename in ["admin_config", "token_config", "app_config", "api_key_config"]:
+            section_map = {
+                "admin_config": "admin",
+                "token_config": "auth",
+                "app_config": "app",
+                "api_key_config": "api_key",
+            }
+            return self.get_state_section(section_map[filename])
+
+        config_path = self.get_config_path(filename)
+
+        if not config_path.exists():
+            return None
+
+        try:
+            with open(config_path) as f:
+                return json.load(f)
+        except (OSError, json.JSONDecodeError) as e:
+            print(f"❌ Error reading {filename}: {e}")
+            return None
+
+    def write_config(self, filename: str, data: dict[str, Any]) -> bool:
+        """Write data to a configuration file.
+
+        DEPRECATED: Use write_state() or update_state_section() for new code.
+        This method provides backward compatibility.
+
+        Args:
+            filename: Name of the config file to write
+            data: Dictionary containing data to save
+
+        Returns:
+            True if successful, False otherwise
+        """
+        # Provide backward compatibility for old config names
+        if filename in ["admin_config", "token_config", "app_config", "api_key_config"]:
+            section_map = {
+                "admin_config": "admin",
+                "token_config": "auth",
+                "app_config": "app",
+                "api_key_config": "api_key",
+            }
+            return self.update_state_section(section_map[filename], data)
+
+        self.ensure_config_dir()
+        config_path = self.get_config_path(filename)
+
+        try:
+            with open(config_path, "w") as f:
+                json.dump(data, f, indent=2)
+            return True
+        except OSError as e:
+            print(f"❌ Error writing {filename}: {e}")
+            return False
+
+    def config_exists(self, filename: str) -> bool:
+        """Check if a config file exists.
+
+        Args:
+            filename: Name of the config file to check
+
+        Returns:
+            True if file exists, False otherwise
+        """
+        return self.get_config_path(filename).exists()
+
+    def delete_config(self, filename: str) -> bool:
+        """Delete a configuration file.
+
+        Args:
+            filename: Name of the config file to delete
+
+        Returns:
+            True if successful, False otherwise
+        """
+        config_path = self.get_config_path(filename)
+
+        if not config_path.exists():
+            return True  # Already doesn't exist
+
+        try:
+            config_path.unlink()
+            return True
+        except OSError as e:
+            print(f"❌ Error deleting {filename}: {e}")
+            return False
+
+    def read_state(self) -> dict[str, Any] | None:
+        """Read the entire stress test state.
+
+        Returns:
+            Dictionary containing all state data, or None if file doesn't exist
+        """
+        state_path = self.get_config_path(self.state_file)
+        if not state_path.exists():
+            return None
+
+        try:
+            with open(state_path) as f:
+                return json.load(f)
+        except (OSError, json.JSONDecodeError) as e:
+            print(f"❌ Error reading {self.state_file}: {e}")
+            return None
+
+    def write_state(self, data: dict[str, Any]) -> bool:
+        """Write the entire stress test state.
+
+        Args:
+            data: Dictionary containing all state data to save
+
+        Returns:
+            True if successful, False otherwise
+        """
+        self.ensure_config_dir()
+        state_path = self.get_config_path(self.state_file)
+
+        try:
+            with open(state_path, "w") as f:
+                json.dump(data, f, indent=2)
+            return True
+        except OSError as e:
+            print(f"❌ Error writing {self.state_file}: {e}")
+            return False
+
+    def update_state_section(self, section: str, data: dict[str, Any]) -> bool:
+        """Update a specific section of the stress test state.
+
+        Args:
+            section: Name of the section to update (e.g., 'admin', 'auth', 'app', 'api_key')
+            data: Dictionary containing section data to save
+
+        Returns:
+            True if successful, False otherwise
+        """
+        state = self.read_state() or {}
+        state[section] = data
+        return self.write_state(state)
+
+    def get_state_section(self, section: str) -> dict[str, Any] | None:
+        """Get a specific section from the stress test state.
+
+        Args:
+            section: Name of the section to get (e.g., 'admin', 'auth', 'app', 'api_key')
+
+        Returns:
+            Dictionary containing section data, or None if not found
+        """
+        state = self.read_state()
+        if state:
+            return state.get(section)
+        return None
+
+    def get_token(self) -> str | None:
+        """Get the access token from auth section.
+
+        Returns:
+            Access token string or None if not found
+        """
+        auth = self.get_state_section("auth")
+        if auth:
+            return auth.get("access_token")
+        return None
+
+    def get_app_id(self) -> str | None:
+        """Get the app ID from app section.
+
+        Returns:
+            App ID string or None if not found
+        """
+        app = self.get_state_section("app")
+        if app:
+            return app.get("app_id")
+        return None
+
+    def get_api_key(self) -> str | None:
+        """Get the API key token from api_key section.
+
+        Returns:
+            API key token string or None if not found
+        """
+        api_key = self.get_state_section("api_key")
+        if api_key:
+            return api_key.get("token")
+        return None
+
+
+# Create a default instance for convenience
+config_helper = ConfigHelper()
--- a/dify/scripts/stress-test/common/logger_helper.py
+++ b/dify/scripts/stress-test/common/logger_helper.py
@@ -0,0 +1,218 @@
+#!/usr/bin/env python3
+
+import sys
+import time
+from enum import Enum
+
+
+class LogLevel(Enum):
+    """Log levels with associated colors and symbols."""
+
+    DEBUG = ("🔍", "\033[90m")  # Gray
+    INFO = ("ℹ️ ", "\033[94m")  # Blue
+    SUCCESS = ("✅", "\033[92m")  # Green
+    WARNING = ("⚠️ ", "\033[93m")  # Yellow
+    ERROR = ("❌", "\033[91m")  # Red
+    STEP = ("🚀", "\033[96m")  # Cyan
+    PROGRESS = ("📋", "\033[95m")  # Magenta
+
+
+class Logger:
+    """Logger class for formatted console output."""
+
+    def __init__(self, name: str | None = None, use_colors: bool = True):
+        """Initialize logger.
+
+        Args:
+            name: Optional name for the logger (e.g., script name)
+            use_colors: Whether to use ANSI color codes
+        """
+        self.name = name
+        self.use_colors = use_colors and sys.stdout.isatty()
+        self._reset_color = "\033[0m" if self.use_colors else ""
+
+    def _format_message(self, level: LogLevel, message: str, indent: int = 0) -> str:
+        """Format a log message with level, color, and indentation.
+
+        Args:
+            level: Log level
+            message: Message to log
+            indent: Number of spaces to indent
+
+        Returns:
+            Formatted message string
+        """
+        symbol, color = level.value
+        color = color if self.use_colors else ""
+        reset = self._reset_color
+
+        prefix = " " * indent
+
+        if self.name and level in [LogLevel.STEP, LogLevel.ERROR]:
+            return f"{prefix}{color}{symbol} [{self.name}] {message}{reset}"
+        else:
+            return f"{prefix}{color}{symbol} {message}{reset}"
+
+    def debug(self, message: str, indent: int = 0) -> None:
+        """Log debug message."""
+        print(self._format_message(LogLevel.DEBUG, message, indent))
+
+    def info(self, message: str, indent: int = 0) -> None:
+        """Log info message."""
+        print(self._format_message(LogLevel.INFO, message, indent))
+
+    def success(self, message: str, indent: int = 0) -> None:
+        """Log success message."""
+        print(self._format_message(LogLevel.SUCCESS, message, indent))
+
+    def warning(self, message: str, indent: int = 0) -> None:
+        """Log warning message."""
+        print(self._format_message(LogLevel.WARNING, message, indent))
+
+    def error(self, message: str, indent: int = 0) -> None:
+        """Log error message."""
+        print(self._format_message(LogLevel.ERROR, message, indent), file=sys.stderr)
+
+    def step(self, message: str, indent: int = 0) -> None:
+        """Log a step in a process."""
+        print(self._format_message(LogLevel.STEP, message, indent))
+
+    def progress(self, message: str, indent: int = 0) -> None:
+        """Log progress information."""
+        print(self._format_message(LogLevel.PROGRESS, message, indent))
+
+    def separator(self, char: str = "-", length: int = 60) -> None:
+        """Print a separator line."""
+        print(char * length)
+
+    def header(self, title: str, width: int = 60) -> None:
+        """Print a formatted header."""
+        if self.use_colors:
+            print(f"\n\033[1m{'=' * width}\033[0m")  # Bold
+            print(f"\033[1m{title.center(width)}\033[0m")
+            print(f"\033[1m{'=' * width}\033[0m\n")
+        else:
+            print(f"\n{'=' * width}")
+            print(title.center(width))
+            print(f"{'=' * width}\n")
+
+    def box(self, title: str, width: int = 60) -> None:
+        """Print a title in a box."""
+        border = "═" * (width - 2)
+        if self.use_colors:
+            print(f"\033[1m╔{border}╗\033[0m")
+            print(f"\033[1m║{title.center(width - 2)}║\033[0m")
+            print(f"\033[1m╚{border}╝\033[0m")
+        else:
+            print(f"╔{border}╗")
+            print(f"║{title.center(width - 2)}║")
+            print(f"╚{border}╝")
+
+    def list_item(self, item: str, indent: int = 2) -> None:
+        """Print a list item."""
+        prefix = " " * indent
+        print(f"{prefix}• {item}")
+
+    def key_value(self, key: str, value: str, indent: int = 2) -> None:
+        """Print a key-value pair."""
+        prefix = " " * indent
+        if self.use_colors:
+            print(f"{prefix}\033[1m{key}:\033[0m {value}")
+        else:
+            print(f"{prefix}{key}: {value}")
+
+    def spinner_start(self, message: str) -> None:
+        """Start a spinner (simple implementation)."""
+        sys.stdout.write(f"\r{message}... ")
+        sys.stdout.flush()
+
+    def spinner_stop(self, success: bool = True, message: str | None = None) -> None:
+        """Stop the spinner and show result."""
+        if success:
+            symbol = "✅" if message else "Done"
+            sys.stdout.write(f"\r{symbol} {message or ''}\n")
+        else:
+            symbol = "❌" if message else "Failed"
+            sys.stdout.write(f"\r{symbol} {message or ''}\n")
+        sys.stdout.flush()
+
+
+class ProgressLogger:
+    """Logger for tracking progress through multiple steps."""
+
+    def __init__(self, total_steps: int, logger: Logger | None = None):
+        """Initialize progress logger.
+
+        Args:
+            total_steps: Total number of steps
+            logger: Logger instance to use (creates new if None)
+        """
+        self.total_steps = total_steps
+        self.current_step = 0
+        self.logger = logger or Logger()
+        self.start_time = time.time()
+
+    def next_step(self, description: str) -> None:
+        """Move to next step and log it."""
+        self.current_step += 1
+        elapsed = time.time() - self.start_time
+
+        if self.logger.use_colors:
+            progress_bar = self._create_progress_bar()
+            print(f"\n\033[1m[Step {self.current_step}/{self.total_steps}]\033[0m {progress_bar}")
+            self.logger.step(f"{description} (Elapsed: {elapsed:.1f}s)")
+        else:
+            print(f"\n[Step {self.current_step}/{self.total_steps}]")
+            self.logger.step(f"{description} (Elapsed: {elapsed:.1f}s)")
+
+    def _create_progress_bar(self, width: int = 20) -> str:
+        """Create a simple progress bar."""
+        filled = int(width * self.current_step / self.total_steps)
+        bar = "█" * filled + "░" * (width - filled)
+        percentage = int(100 * self.current_step / self.total_steps)
+        return f"[{bar}] {percentage}%"
+
+    def complete(self) -> None:
+        """Mark progress as complete."""
+        elapsed = time.time() - self.start_time
+        self.logger.success(f"All steps completed! Total time: {elapsed:.1f}s")
+
+
+# Create default logger instance
+logger = Logger()
+
+
+# Convenience functions using default logger
+def debug(message: str, indent: int = 0) -> None:
+    """Log debug message using default logger."""
+    logger.debug(message, indent)
+
+
+def info(message: str, indent: int = 0) -> None:
+    """Log info message using default logger."""
+    logger.info(message, indent)
+
+
+def success(message: str, indent: int = 0) -> None:
+    """Log success message using default logger."""
+    logger.success(message, indent)
+
+
+def warning(message: str, indent: int = 0) -> None:
+    """Log warning message using default logger."""
+    logger.warning(message, indent)
+
+
+def error(message: str, indent: int = 0) -> None:
+    """Log error message using default logger."""
+    logger.error(message, indent)
+
+
+def step(message: str, indent: int = 0) -> None:
+    """Log step using default logger."""
+    logger.step(message, indent)
+
+
+def progress(message: str, indent: int = 0) -> None:
+    """Log progress using default logger."""
+    logger.progress(message, indent)
--- a/dify/scripts/stress-test/locust.conf
+++ b/dify/scripts/stress-test/locust.conf
@@ -0,0 +1,37 @@
+# Locust configuration file for Dify SSE benchmark
+
+# Target host
+host = http://localhost:5001
+
+# Number of users to simulate
+users = 10
+
+# Spawn rate (users per second)
+spawn-rate = 2
+
+# Run time (use format like 30s, 5m, 1h)
+run-time = 1m
+
+# Locustfile to use
+locustfile = scripts/stress-test/sse_benchmark.py
+
+# Headless mode (no web UI)
+headless = true
+
+# Print stats in the console
+print-stats = true
+
+# Only print summary stats
+only-summary = false
+
+# Reset statistics after ramp-up
+reset-stats = false
+
+# Log level
+loglevel = INFO
+
+# CSV output (uncomment to enable)
+# csv = reports/locust_results
+
+# HTML report (uncomment to enable)
+# html = reports/locust_report.html
--- a/dify/scripts/stress-test/run_locust_stress_test.sh
+++ b/dify/scripts/stress-test/run_locust_stress_test.sh
@@ -0,0 +1,202 @@
+#!/bin/bash
+
+# Run Dify SSE Stress Test using Locust
+
+set -e
+
+# Get the directory where this script is located
+SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+# Go to project root first, then to script dir
+PROJECT_ROOT="$( cd "${SCRIPT_DIR}/../.." && pwd )"
+cd "${PROJECT_ROOT}"
+STRESS_TEST_DIR="scripts/stress-test"
+
+# Colors for output
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+BLUE='\033[0;34m'
+CYAN='\033[0;36m'
+NC='\033[0m' # No Color
+
+# Configuration
+TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
+REPORT_DIR="${STRESS_TEST_DIR}/reports"
+CSV_PREFIX="${REPORT_DIR}/locust_${TIMESTAMP}"
+HTML_REPORT="${REPORT_DIR}/locust_report_${TIMESTAMP}.html"
+SUMMARY_REPORT="${REPORT_DIR}/locust_summary_${TIMESTAMP}.txt"
+
+# Create reports directory if it doesn't exist
+mkdir -p "${REPORT_DIR}"
+
+echo -e "${BLUE}╔════════════════════════════════════════════════════════════════╗${NC}"
+echo -e "${BLUE}║             DIFY SSE WORKFLOW STRESS TEST (LOCUST)             ║${NC}"
+echo -e "${BLUE}╚════════════════════════════════════════════════════════════════╝${NC}"
+echo
+
+# Check if services are running
+echo -e "${YELLOW}Checking services...${NC}"
+
+# Check Dify API
+if curl -s -f http://localhost:5001/health > /dev/null 2>&1; then
+    echo -e "${GREEN}✓ Dify API is running${NC}"
+    
+    # Warn if running in debug mode (check for werkzeug in process)
+    if ps aux | grep -v grep | grep -q "werkzeug.*5001\|flask.*run.*5001"; then
+        echo -e "${YELLOW}⚠ WARNING: API appears to be running in debug mode (Flask development server)${NC}"
+        echo -e "${YELLOW}  This will give inaccurate benchmark results!${NC}"
+        echo -e "${YELLOW}  For accurate benchmarking, restart with Gunicorn:${NC}"
+        echo -e "${CYAN}  cd api && uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent app:app${NC}"
+        echo
+        echo -n "Continue anyway? (not recommended) [y/N]: "
+        read -t 10 continue_debug || continue_debug="n"
+        if [ "$continue_debug" != "y" ] && [ "$continue_debug" != "Y" ]; then
+            echo -e "${RED}Benchmark cancelled. Please restart API with Gunicorn.${NC}"
+            exit 1
+        fi
+    fi
+else
+    echo -e "${RED}✗ Dify API is not running on port 5001${NC}"
+    echo -e "${YELLOW}  Start it with Gunicorn for accurate benchmarking:${NC}"
+    echo -e "${CYAN}  cd api && uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent app:app${NC}"
+    exit 1
+fi
+
+# Check Mock OpenAI server
+if curl -s -f http://localhost:5004/v1/models > /dev/null 2>&1; then
+    echo -e "${GREEN}✓ Mock OpenAI server is running${NC}"
+else
+    echo -e "${RED}✗ Mock OpenAI server is not running on port 5004${NC}"
+    echo -e "${YELLOW}  Start it with: python scripts/stress-test/setup/mock_openai_server.py${NC}"
+    exit 1
+fi
+
+# Check API token exists
+if [ ! -f "${STRESS_TEST_DIR}/setup/config/stress_test_state.json" ]; then
+    echo -e "${RED}✗ Stress test configuration not found${NC}"
+    echo -e "${YELLOW}  Run setup first: python scripts/stress-test/setup_all.py${NC}"
+    exit 1
+fi
+
+API_TOKEN=$(python3 -c "import json; state = json.load(open('${STRESS_TEST_DIR}/setup/config/stress_test_state.json')); print(state.get('api_key', {}).get('token', ''))" 2>/dev/null)
+if [ -z "$API_TOKEN" ]; then
+    echo -e "${RED}✗ Failed to read API token from stress test state${NC}"
+    exit 1
+fi
+echo -e "${GREEN}✓ API token found: ${API_TOKEN:0:10}...${NC}"
+
+echo
+echo -e "${CYAN}═══════════════════════════════════════════════════════════════${NC}"
+echo -e "${CYAN}                   STRESS TEST PARAMETERS                       ${NC}"
+echo -e "${CYAN}═══════════════════════════════════════════════════════════════${NC}"
+
+# Parse configuration
+USERS=$(grep "^users" ${STRESS_TEST_DIR}/locust.conf | cut -d'=' -f2 | tr -d ' ')
+SPAWN_RATE=$(grep "^spawn-rate" ${STRESS_TEST_DIR}/locust.conf | cut -d'=' -f2 | tr -d ' ')
+RUN_TIME=$(grep "^run-time" ${STRESS_TEST_DIR}/locust.conf | cut -d'=' -f2 | tr -d ' ')
+
+echo -e "  ${YELLOW}Users:${NC}       $USERS concurrent users"
+echo -e "  ${YELLOW}Spawn Rate:${NC}  $SPAWN_RATE users/second"
+echo -e "  ${YELLOW}Duration:${NC}    $RUN_TIME"
+echo -e "  ${YELLOW}Mode:${NC}        SSE Streaming"
+echo
+
+# Ask user for run mode
+echo -e "${YELLOW}Select run mode:${NC}"
+echo "  1) Headless (CLI only) - Default"
+echo "  2) Web UI (http://localhost:8089)"
+echo -n "Choice [1]: "
+read -t 10 choice || choice="1"
+echo
+
+# Use SSE stress test script
+LOCUST_SCRIPT="${STRESS_TEST_DIR}/sse_benchmark.py"
+
+# Prepare Locust command
+if [ "$choice" = "2" ]; then
+    echo -e "${BLUE}Starting Locust with Web UI...${NC}"
+    echo -e "${YELLOW}Access the web interface at: ${CYAN}http://localhost:8089${NC}"
+    echo
+    
+    # Run with web UI
+    uv --project api run locust \
+        -f ${LOCUST_SCRIPT} \
+        --host http://localhost:5001 \
+        --web-port 8089
+else
+    echo -e "${BLUE}Starting stress test in headless mode...${NC}"
+    echo
+    
+    # Run in headless mode with CSV output
+    uv --project api run locust \
+        -f ${LOCUST_SCRIPT} \
+        --host http://localhost:5001 \
+        --users $USERS \
+        --spawn-rate $SPAWN_RATE \
+        --run-time $RUN_TIME \
+        --headless \
+        --print-stats \
+        --csv=$CSV_PREFIX \
+        --html=$HTML_REPORT \
+        2>&1 | tee $SUMMARY_REPORT
+    
+    echo
+    echo -e "${GREEN}═══════════════════════════════════════════════════════════════${NC}"
+    echo -e "${GREEN}                   STRESS TEST COMPLETE                        ${NC}"
+    echo -e "${GREEN}═══════════════════════════════════════════════════════════════${NC}"
+    echo
+    echo -e "${BLUE}Reports generated:${NC}"
+    echo -e "  ${YELLOW}Summary:${NC}     $SUMMARY_REPORT"
+    echo -e "  ${YELLOW}HTML Report:${NC} $HTML_REPORT"
+    echo -e "  ${YELLOW}CSV Stats:${NC}   ${CSV_PREFIX}_stats.csv"
+    echo -e "  ${YELLOW}CSV History:${NC} ${CSV_PREFIX}_stats_history.csv"
+    echo
+    echo -e "${CYAN}View HTML report:${NC}"
+    echo "  open $HTML_REPORT  # macOS"
+    echo "  xdg-open $HTML_REPORT  # Linux"
+    echo
+    
+    # Parse and display key metrics
+    echo -e "${CYAN}═══════════════════════════════════════════════════════════════${NC}"
+    echo -e "${CYAN}                        KEY METRICS                            ${NC}"
+    echo -e "${CYAN}═══════════════════════════════════════════════════════════════${NC}"
+    
+    if [ -f "${CSV_PREFIX}_stats.csv" ]; then
+        python3 - <<EOF
+import csv
+import sys
+
+csv_file = "${CSV_PREFIX}_stats.csv"
+
+try:
+    with open(csv_file, 'r') as f:
+        reader = csv.DictReader(f)
+        rows = list(reader)
+        
+        # Find the aggregated row
+        for row in rows:
+            if row.get('Name') == 'Aggregated':
+                print(f"  Total Requests:     {row.get('Request Count', 'N/A')}")
+                print(f"  Failure Rate:       {row.get('Failure Count', '0')} failures")
+                print(f"  Median Response:    {row.get('Median Response Time', 'N/A')} ms")
+                print(f"  95%ile Response:    {row.get('95%', 'N/A')} ms")
+                print(f"  99%ile Response:    {row.get('99%', 'N/A')} ms")
+                print(f"  RPS:                {row.get('Requests/s', 'N/A')}")
+                break
+                
+        # Show SSE-specific metrics
+        print()
+        print("SSE Streaming Metrics:")
+        for row in rows:
+            if 'Time to First Event' in row.get('Name', ''):
+                print(f"  Time to First Event: {row.get('Median Response Time', 'N/A')} ms (median)")
+            elif 'Stream Duration' in row.get('Name', ''):
+                print(f"  Stream Duration:     {row.get('Median Response Time', 'N/A')} ms (median)")
+                
+except Exception as e:
+    print(f"Could not parse metrics: {e}")
+EOF
+    fi
+    
+    echo -e "${CYAN}═══════════════════════════════════════════════════════════════${NC}"
+fi
--- a/dify/scripts/stress-test/setup/configure_openai_plugin.py
+++ b/dify/scripts/stress-test/setup/configure_openai_plugin.py
@@ -0,0 +1,97 @@
+#!/usr/bin/env python3
+
+import sys
+from pathlib import Path
+
+sys.path.append(str(Path(__file__).parent.parent))
+
+import httpx
+from common import Logger, config_helper
+
+
+def configure_openai_plugin() -> None:
+    """Configure OpenAI plugin with mock server credentials."""
+
+    log = Logger("ConfigPlugin")
+    log.header("Configuring OpenAI Plugin")
+
+    # Read token from config
+    access_token = config_helper.get_token()
+    if not access_token:
+        log.error("No access token found in config")
+        log.info("Please run login_admin.py first to get access token")
+        return
+
+    log.step("Configuring OpenAI plugin with mock server...")
+
+    # API endpoint for plugin configuration
+    base_url = "http://localhost:5001"
+    config_endpoint = f"{base_url}/console/api/workspaces/current/model-providers/langgenius/openai/openai/credentials"
+
+    # Configuration payload with mock server
+    config_payload = {
+        "credentials": {
+            "openai_api_key": "apikey",
+            "openai_organization": None,
+            "openai_api_base": "http://host.docker.internal:5004",
+        }
+    }
+
+    headers = {
+        "Accept": "*/*",
+        "Accept-Language": "en-US,en;q=0.9",
+        "Cache-Control": "no-cache",
+        "Connection": "keep-alive",
+        "DNT": "1",
+        "Origin": "http://localhost:3000",
+        "Pragma": "no-cache",
+        "Referer": "http://localhost:3000/",
+        "Sec-Fetch-Dest": "empty",
+        "Sec-Fetch-Mode": "cors",
+        "Sec-Fetch-Site": "same-site",
+        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36",
+        "authorization": f"Bearer {access_token}",
+        "content-type": "application/json",
+        "sec-ch-ua": '"Not;A=Brand";v="99", "Google Chrome";v="139", "Chromium";v="139"',
+        "sec-ch-ua-mobile": "?0",
+        "sec-ch-ua-platform": '"macOS"',
+    }
+
+    cookies = {"locale": "en-US"}
+
+    try:
+        # Make the configuration request
+        with httpx.Client() as client:
+            response = client.post(
+                config_endpoint,
+                json=config_payload,
+                headers=headers,
+                cookies=cookies,
+            )
+
+            if response.status_code == 200:
+                log.success("OpenAI plugin configured successfully!")
+                log.key_value("API Base", config_payload["credentials"]["openai_api_base"])
+                log.key_value("API Key", config_payload["credentials"]["openai_api_key"])
+
+            elif response.status_code == 201:
+                log.success("OpenAI plugin credentials created successfully!")
+                log.key_value("API Base", config_payload["credentials"]["openai_api_base"])
+                log.key_value("API Key", config_payload["credentials"]["openai_api_key"])
+
+            elif response.status_code == 401:
+                log.error("Configuration failed: Unauthorized")
+                log.info("Token may have expired. Please run login_admin.py again")
+            else:
+                log.error(f"Configuration failed with status code: {response.status_code}")
+                log.debug(f"Response: {response.text}")
+
+    except httpx.ConnectError:
+        log.error("Could not connect to Dify API at http://localhost:5001")
+        log.info("Make sure the API server is running with: ./dev/start-api")
+    except Exception as e:
+        log.error(f"An error occurred: {e}")
+
+
+if __name__ == "__main__":
+    configure_openai_plugin()
--- a/dify/scripts/stress-test/setup/create_api_key.py
+++ b/dify/scripts/stress-test/setup/create_api_key.py
@@ -0,0 +1,113 @@
+#!/usr/bin/env python3
+
+import sys
+from pathlib import Path
+
+sys.path.append(str(Path(__file__).parent.parent))
+
+import json
+
+import httpx
+from common import Logger, config_helper
+
+
+def create_api_key() -> None:
+    """Create API key for the imported app."""
+
+    log = Logger("CreateAPIKey")
+    log.header("Creating API Key")
+
+    # Read token from config
+    access_token = config_helper.get_token()
+    if not access_token:
+        log.error("No access token found in config")
+        return
+
+    # Read app_id from config
+    app_id = config_helper.get_app_id()
+    if not app_id:
+        log.error("No app_id found in config")
+        log.info("Please run import_workflow_app.py first to import the app")
+        return
+
+    log.step(f"Creating API key for app: {app_id}")
+
+    # API endpoint for creating API key
+    base_url = "http://localhost:5001"
+    api_key_endpoint = f"{base_url}/console/api/apps/{app_id}/api-keys"
+
+    headers = {
+        "Accept": "*/*",
+        "Accept-Language": "en-US,en;q=0.9",
+        "Cache-Control": "no-cache",
+        "Connection": "keep-alive",
+        "Content-Length": "0",
+        "DNT": "1",
+        "Origin": "http://localhost:3000",
+        "Pragma": "no-cache",
+        "Referer": "http://localhost:3000/",
+        "Sec-Fetch-Dest": "empty",
+        "Sec-Fetch-Mode": "cors",
+        "Sec-Fetch-Site": "same-site",
+        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36",
+        "authorization": f"Bearer {access_token}",
+        "content-type": "application/json",
+        "sec-ch-ua": '"Not;A=Brand";v="99", "Google Chrome";v="139", "Chromium";v="139"',
+        "sec-ch-ua-mobile": "?0",
+        "sec-ch-ua-platform": '"macOS"',
+    }
+
+    cookies = {"locale": "en-US"}
+
+    try:
+        # Make the API key creation request
+        with httpx.Client() as client:
+            response = client.post(
+                api_key_endpoint,
+                headers=headers,
+                cookies=cookies,
+            )
+
+            if response.status_code == 200 or response.status_code == 201:
+                response_data = response.json()
+
+                api_key_id = response_data.get("id")
+                api_key_token = response_data.get("token")
+
+                if api_key_token:
+                    log.success("API key created successfully!")
+                    log.key_value("Key ID", api_key_id)
+                    log.key_value("Token", api_key_token)
+                    log.key_value("Type", response_data.get("type"))
+
+                    # Save API key to config
+                    api_key_config = {
+                        "id": api_key_id,
+                        "token": api_key_token,
+                        "type": response_data.get("type"),
+                        "app_id": app_id,
+                        "created_at": response_data.get("created_at"),
+                    }
+
+                    if config_helper.write_config("api_key_config", api_key_config):
+                        log.info(f"API key saved to: {config_helper.get_config_path('benchmark_state')}")
+                else:
+                    log.error("No API token received")
+                    log.debug(f"Response: {json.dumps(response_data, indent=2)}")
+
+            elif response.status_code == 401:
+                log.error("API key creation failed: Unauthorized")
+                log.info("Token may have expired. Please run login_admin.py again")
+            else:
+                log.error(f"API key creation failed with status code: {response.status_code}")
+                log.debug(f"Response: {response.text}")
+
+    except httpx.ConnectError:
+        log.error("Could not connect to Dify API at http://localhost:5001")
+        log.info("Make sure the API server is running with: ./dev/start-api")
+    except Exception as e:
+        log.error(f"An error occurred: {e}")
+
+
+if __name__ == "__main__":
+    create_api_key()
--- a/dify/scripts/stress-test/setup/dsl/workflow_llm.yml
+++ b/dify/scripts/stress-test/setup/dsl/workflow_llm.yml
@@ -0,0 +1,176 @@
+app:
+  description: ''
+  icon: 🤖
+  icon_background: '#FFEAD5'
+  mode: workflow
+  name: workflow_llm
+  use_icon_as_answer_icon: false
+dependencies:
+- current_identifier: null
+  type: marketplace
+  value:
+    marketplace_plugin_unique_identifier: langgenius/openai:0.2.5@373362a028986aae53a7baf73a7f11991ba3c22c69eaf97d6cde048cfd4a9f98
+kind: app
+version: 0.4.0
+workflow:
+  conversation_variables: []
+  environment_variables: []
+  features:
+    file_upload:
+      allowed_file_extensions:
+      - .JPG
+      - .JPEG
+      - .PNG
+      - .GIF
+      - .WEBP
+      - .SVG
+      allowed_file_types:
+      - image
+      allowed_file_upload_methods:
+      - local_file
+      - remote_url
+      enabled: false
+      fileUploadConfig:
+        audio_file_size_limit: 50
+        batch_count_limit: 5
+        file_size_limit: 15
+        image_file_size_limit: 10
+        video_file_size_limit: 100
+        workflow_file_upload_limit: 10
+      image:
+        enabled: false
+        number_limits: 3
+        transfer_methods:
+        - local_file
+        - remote_url
+      number_limits: 3
+    opening_statement: ''
+    retriever_resource:
+      enabled: true
+    sensitive_word_avoidance:
+      enabled: false
+    speech_to_text:
+      enabled: false
+    suggested_questions: []
+    suggested_questions_after_answer:
+      enabled: false
+    text_to_speech:
+      enabled: false
+      language: ''
+      voice: ''
+  graph:
+    edges:
+    - data:
+        isInIteration: false
+        isInLoop: false
+        sourceType: start
+        targetType: llm
+      id: 1757611990947-source-1757611992921-target
+      source: '1757611990947'
+      sourceHandle: source
+      target: '1757611992921'
+      targetHandle: target
+      type: custom
+      zIndex: 0
+    - data:
+        isInIteration: false
+        isInLoop: false
+        sourceType: llm
+        targetType: end
+      id: 1757611992921-source-1757611996447-target
+      source: '1757611992921'
+      sourceHandle: source
+      target: '1757611996447'
+      targetHandle: target
+      type: custom
+      zIndex: 0
+    nodes:
+    - data:
+        desc: ''
+        selected: false
+        title: Start
+        type: start
+        variables:
+        - label: question
+          max_length: null
+          options: []
+          required: true
+          type: text-input
+          variable: question
+      height: 90
+      id: '1757611990947'
+      position:
+        x: 30
+        y: 245
+      positionAbsolute:
+        x: 30
+        y: 245
+      selected: false
+      sourcePosition: right
+      targetPosition: left
+      type: custom
+      width: 244
+    - data:
+        context:
+          enabled: false
+          variable_selector: []
+        desc: ''
+        model:
+          completion_params:
+            temperature: 0.7
+          mode: chat
+          name: gpt-4o
+          provider: langgenius/openai/openai
+        prompt_template:
+        - id: c165fcb6-f1f0-42f2-abab-e81982434deb
+          role: system
+          text: ''
+        - role: user
+          text: '{{#1757611990947.question#}}'
+        selected: false
+        title: LLM
+        type: llm
+        variables: []
+        vision:
+          enabled: false
+      height: 90
+      id: '1757611992921'
+      position:
+        x: 334
+        y: 245
+      positionAbsolute:
+        x: 334
+        y: 245
+      selected: false
+      sourcePosition: right
+      targetPosition: left
+      type: custom
+      width: 244
+    - data:
+        desc: ''
+        outputs:
+        - value_selector:
+          - '1757611992921'
+          - text
+          value_type: string
+          variable: answer
+        selected: false
+        title: End
+        type: end
+      height: 90
+      id: '1757611996447'
+      position:
+        x: 638
+        y: 245
+      positionAbsolute:
+        x: 638
+        y: 245
+      selected: true
+      sourcePosition: right
+      targetPosition: left
+      type: custom
+      width: 244
+    viewport:
+      x: 0
+      y: 0
+      zoom: 0.7
--- a/dify/scripts/stress-test/setup/import_workflow_app.py
+++ b/dify/scripts/stress-test/setup/import_workflow_app.py
@@ -0,0 +1,128 @@
+#!/usr/bin/env python3
+
+import sys
+from pathlib import Path
+
+sys.path.append(str(Path(__file__).parent.parent))
+
+import json
+
+import httpx
+from common import Logger, config_helper  # type: ignore[import]
+
+
+def import_workflow_app() -> None:
+    """Import workflow app from DSL file and save app_id."""
+
+    log = Logger("ImportApp")
+    log.header("Importing Workflow Application")
+
+    # Read token from config
+    access_token = config_helper.get_token()
+    if not access_token:
+        log.error("No access token found in config")
+        log.info("Please run login_admin.py first to get access token")
+        return
+
+    # Read workflow DSL file
+    dsl_path = Path(__file__).parent / "dsl" / "workflow_llm.yml"
+
+    if not dsl_path.exists():
+        log.error(f"DSL file not found: {dsl_path}")
+        return
+
+    with open(dsl_path) as f:
+        yaml_content = f.read()
+
+    log.step("Importing workflow app from DSL...")
+    log.key_value("DSL file", dsl_path.name)
+
+    # API endpoint for app import
+    base_url = "http://localhost:5001"
+    import_endpoint = f"{base_url}/console/api/apps/imports"
+
+    # Import payload
+    import_payload = {"mode": "yaml-content", "yaml_content": yaml_content}
+
+    headers = {
+        "Accept": "*/*",
+        "Accept-Language": "en-US,en;q=0.9",
+        "Cache-Control": "no-cache",
+        "Connection": "keep-alive",
+        "DNT": "1",
+        "Origin": "http://localhost:3000",
+        "Pragma": "no-cache",
+        "Referer": "http://localhost:3000/",
+        "Sec-Fetch-Dest": "empty",
+        "Sec-Fetch-Mode": "cors",
+        "Sec-Fetch-Site": "same-site",
+        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36",
+        "authorization": f"Bearer {access_token}",
+        "content-type": "application/json",
+        "sec-ch-ua": '"Not;A=Brand";v="99", "Google Chrome";v="139", "Chromium";v="139"',
+        "sec-ch-ua-mobile": "?0",
+        "sec-ch-ua-platform": '"macOS"',
+    }
+
+    cookies = {"locale": "en-US"}
+
+    try:
+        # Make the import request
+        with httpx.Client() as client:
+            response = client.post(
+                import_endpoint,
+                json=import_payload,
+                headers=headers,
+                cookies=cookies,
+            )
+
+            if response.status_code == 200:
+                response_data = response.json()
+
+                # Check import status
+                if response_data.get("status") == "completed":
+                    app_id = response_data.get("app_id")
+
+                    if app_id:
+                        log.success("Workflow app imported successfully!")
+                        log.key_value("App ID", app_id)
+                        log.key_value("App Mode", response_data.get("app_mode"))
+                        log.key_value("DSL Version", response_data.get("imported_dsl_version"))
+
+                        # Save app_id to config
+                        app_config = {
+                            "app_id": app_id,
+                            "app_mode": response_data.get("app_mode"),
+                            "app_name": "workflow_llm",
+                            "dsl_version": response_data.get("imported_dsl_version"),
+                        }
+
+                        if config_helper.write_config("app_config", app_config):
+                            log.info(f"App config saved to: {config_helper.get_config_path('benchmark_state')}")
+                    else:
+                        log.error("Import completed but no app_id received")
+                        log.debug(f"Response: {json.dumps(response_data, indent=2)}")
+
+                elif response_data.get("status") == "failed":
+                    log.error("Import failed")
+                    log.error(f"Error: {response_data.get('error')}")
+                else:
+                    log.warning(f"Import status: {response_data.get('status')}")
+                    log.debug(f"Response: {json.dumps(response_data, indent=2)}")
+
+            elif response.status_code == 401:
+                log.error("Import failed: Unauthorized")
+                log.info("Token may have expired. Please run login_admin.py again")
+            else:
+                log.error(f"Import failed with status code: {response.status_code}")
+                log.debug(f"Response: {response.text}")
+
+    except httpx.ConnectError:
+        log.error("Could not connect to Dify API at http://localhost:5001")
+        log.info("Make sure the API server is running with: ./dev/start-api")
+    except Exception as e:
+        log.error(f"An error occurred: {e}")
+
+
+if __name__ == "__main__":
+    import_workflow_app()
--- a/dify/scripts/stress-test/setup/install_openai_plugin.py
+++ b/dify/scripts/stress-test/setup/install_openai_plugin.py
@@ -0,0 +1,157 @@
+#!/usr/bin/env python3
+
+import sys
+from pathlib import Path
+
+sys.path.append(str(Path(__file__).parent.parent))
+
+import time
+
+import httpx
+from common import Logger, config_helper
+
+
+def install_openai_plugin() -> None:
+    """Install OpenAI plugin using saved access token."""
+
+    log = Logger("InstallPlugin")
+    log.header("Installing OpenAI Plugin")
+
+    # Read token from config
+    access_token = config_helper.get_token()
+    if not access_token:
+        log.error("No access token found in config")
+        log.info("Please run login_admin.py first to get access token")
+        return
+
+    log.step("Installing OpenAI plugin...")
+
+    # API endpoint for plugin installation
+    base_url = "http://localhost:5001"
+    install_endpoint = f"{base_url}/console/api/workspaces/current/plugin/install/marketplace"
+
+    # Plugin identifier
+    plugin_payload = {
+        "plugin_unique_identifiers": [
+            "langgenius/openai:0.2.5@373362a028986aae53a7baf73a7f11991ba3c22c69eaf97d6cde048cfd4a9f98"
+        ]
+    }
+
+    headers = {
+        "Accept": "*/*",
+        "Accept-Language": "en-US,en;q=0.9",
+        "Cache-Control": "no-cache",
+        "Connection": "keep-alive",
+        "DNT": "1",
+        "Origin": "http://localhost:3000",
+        "Pragma": "no-cache",
+        "Referer": "http://localhost:3000/",
+        "Sec-Fetch-Dest": "empty",
+        "Sec-Fetch-Mode": "cors",
+        "Sec-Fetch-Site": "same-site",
+        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36",
+        "authorization": f"Bearer {access_token}",
+        "content-type": "application/json",
+        "sec-ch-ua": '"Not;A=Brand";v="99", "Google Chrome";v="139", "Chromium";v="139"',
+        "sec-ch-ua-mobile": "?0",
+        "sec-ch-ua-platform": '"macOS"',
+    }
+
+    cookies = {"locale": "en-US"}
+
+    try:
+        # Make the installation request
+        with httpx.Client() as client:
+            response = client.post(
+                install_endpoint,
+                json=plugin_payload,
+                headers=headers,
+                cookies=cookies,
+            )
+
+            if response.status_code == 200:
+                response_data = response.json()
+                task_id = response_data.get("task_id")
+
+                if not task_id:
+                    log.error("No task ID received from installation request")
+                    return
+
+                log.progress(f"Installation task created: {task_id}")
+                log.info("Polling for task completion...")
+
+                # Poll for task completion
+                task_endpoint = f"{base_url}/console/api/workspaces/current/plugin/tasks/{task_id}"
+
+                max_attempts = 30  # 30 attempts with 2 second delay = 60 seconds max
+                attempt = 0
+
+                log.spinner_start("Installing plugin")
+
+                while attempt < max_attempts:
+                    attempt += 1
+                    time.sleep(2)  # Wait 2 seconds between polls
+
+                    task_response = client.get(
+                        task_endpoint,
+                        headers=headers,
+                        cookies=cookies,
+                    )
+
+                    if task_response.status_code != 200:
+                        log.spinner_stop(
+                            success=False,
+                            message=f"Failed to get task status: {task_response.status_code}",
+                        )
+                        return
+
+                    task_data = task_response.json()
+                    task_info = task_data.get("task", {})
+                    status = task_info.get("status")
+
+                    if status == "success":
+                        log.spinner_stop(success=True, message="Plugin installed!")
+                        log.success("OpenAI plugin installed successfully!")
+
+                        # Display plugin info
+                        plugins = task_info.get("plugins", [])
+                        if plugins:
+                            plugin_info = plugins[0]
+                            log.key_value("Plugin ID", plugin_info.get("plugin_id"))
+                            log.key_value("Message", plugin_info.get("message"))
+                        break
+
+                    elif status == "failed":
+                        log.spinner_stop(success=False, message="Installation failed")
+                        log.error("Plugin installation failed")
+                        plugins = task_info.get("plugins", [])
+                        if plugins:
+                            for plugin in plugins:
+                                log.list_item(f"{plugin.get('plugin_id')}: {plugin.get('message')}")
+                        break
+
+                    # Continue polling if status is "pending" or other
+
+                else:
+                    log.spinner_stop(success=False, message="Installation timed out")
+                    log.error("Installation timed out after 60 seconds")
+
+            elif response.status_code == 401:
+                log.error("Installation failed: Unauthorized")
+                log.info("Token may have expired. Please run login_admin.py again")
+            elif response.status_code == 409:
+                log.warning("Plugin may already be installed")
+                log.debug(f"Response: {response.text}")
+            else:
+                log.error(f"Installation failed with status code: {response.status_code}")
+                log.debug(f"Response: {response.text}")
+
+    except httpx.ConnectError:
+        log.error("Could not connect to Dify API at http://localhost:5001")
+        log.info("Make sure the API server is running with: ./dev/start-api")
+    except Exception as e:
+        log.error(f"An error occurred: {e}")
+
+
+if __name__ == "__main__":
+    install_openai_plugin()
--- a/dify/scripts/stress-test/setup/login_admin.py
+++ b/dify/scripts/stress-test/setup/login_admin.py
@@ -0,0 +1,101 @@
+#!/usr/bin/env python3
+
+import sys
+from pathlib import Path
+
+sys.path.append(str(Path(__file__).parent.parent))
+
+import json
+
+import httpx
+from common import Logger, config_helper
+
+
+def login_admin() -> None:
+    """Login with admin account and save access token."""
+
+    log = Logger("Login")
+    log.header("Admin Login")
+
+    # Read admin credentials from config
+    admin_config = config_helper.read_config("admin_config")
+
+    if not admin_config:
+        log.error("Admin config not found")
+        log.info("Please run setup_admin.py first to create the admin account")
+        return
+
+    log.info(f"Logging in with email: {admin_config['email']}")
+
+    # API login endpoint
+    base_url = "http://localhost:5001"
+    login_endpoint = f"{base_url}/console/api/login"
+
+    # Prepare login payload
+    login_payload = {
+        "email": admin_config["email"],
+        "password": admin_config["password"],
+        "remember_me": True,
+    }
+
+    try:
+        # Make the login request
+        with httpx.Client() as client:
+            response = client.post(
+                login_endpoint,
+                json=login_payload,
+                headers={"Content-Type": "application/json"},
+            )
+
+            if response.status_code == 200:
+                log.success("Login successful!")
+
+                # Extract token from response
+                response_data = response.json()
+
+                # Check if login was successful
+                if response_data.get("result") != "success":
+                    log.error(f"Login failed: {response_data}")
+                    return
+
+                # Extract tokens from data field
+                token_data = response_data.get("data", {})
+                access_token = token_data.get("access_token", "")
+                refresh_token = token_data.get("refresh_token", "")
+
+                if not access_token:
+                    log.error("No access token found in response")
+                    log.debug(f"Full response: {json.dumps(response_data, indent=2)}")
+                    return
+
+                # Save token to config file
+                token_config = {
+                    "email": admin_config["email"],
+                    "access_token": access_token,
+                    "refresh_token": refresh_token,
+                }
+
+                # Save token config
+                if config_helper.write_config("token_config", token_config):
+                    log.info(f"Token saved to: {config_helper.get_config_path('benchmark_state')}")
+
+                # Show truncated token for verification
+                token_display = f"{access_token[:20]}..." if len(access_token) > 20 else "Token saved"
+                log.key_value("Access token", token_display)
+
+            elif response.status_code == 401:
+                log.error("Login failed: Invalid credentials")
+                log.debug(f"Response: {response.text}")
+            else:
+                log.error(f"Login failed with status code: {response.status_code}")
+                log.debug(f"Response: {response.text}")
+
+    except httpx.ConnectError:
+        log.error("Could not connect to Dify API at http://localhost:5001")
+        log.info("Make sure the API server is running with: ./dev/start-api")
+    except Exception as e:
+        log.error(f"An error occurred: {e}")
+
+
+if __name__ == "__main__":
+    login_admin()
--- a/dify/scripts/stress-test/setup/mock_openai_server.py
+++ b/dify/scripts/stress-test/setup/mock_openai_server.py
@@ -0,0 +1,205 @@
+#!/usr/bin/env python3
+
+import json
+import time
+import uuid
+from collections.abc import Iterator
+from typing import Any
+
+from flask import Flask, Response, jsonify, request
+
+app = Flask(__name__)
+
+# Mock models list
+MODELS = [
+    {
+        "id": "gpt-3.5-turbo",
+        "object": "model",
+        "created": 1677649963,
+        "owned_by": "openai",
+    },
+    {"id": "gpt-4", "object": "model", "created": 1687882411, "owned_by": "openai"},
+    {
+        "id": "text-embedding-ada-002",
+        "object": "model",
+        "created": 1671217299,
+        "owned_by": "openai-internal",
+    },
+]
+
+
+@app.route("/v1/models", methods=["GET"])
+def list_models() -> Any:
+    """List available models."""
+    return jsonify({"object": "list", "data": MODELS})
+
+
+@app.route("/v1/chat/completions", methods=["POST"])
+def chat_completions() -> Any:
+    """Handle chat completions."""
+    data = request.json or {}
+    model = data.get("model", "gpt-3.5-turbo")
+    messages = data.get("messages", [])
+    stream = data.get("stream", False)
+
+    # Generate mock response
+    response_content = "This is a mock response from the OpenAI server."
+    if messages:
+        last_message = messages[-1].get("content", "")
+        response_content = f"Mock response to: {last_message[:100]}..."
+
+    if stream:
+        # Streaming response
+        def generate() -> Iterator[str]:
+            # Send initial chunk
+            chunk = {
+                "id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
+                "object": "chat.completion.chunk",
+                "created": int(time.time()),
+                "model": model,
+                "choices": [
+                    {
+                        "index": 0,
+                        "delta": {"role": "assistant", "content": ""},
+                        "finish_reason": None,
+                    }
+                ],
+            }
+            yield f"data: {json.dumps(chunk)}\n\n"
+
+            # Send content in chunks
+            words = response_content.split()
+            for word in words:
+                chunk = {
+                    "id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
+                    "object": "chat.completion.chunk",
+                    "created": int(time.time()),
+                    "model": model,
+                    "choices": [
+                        {
+                            "index": 0,
+                            "delta": {"content": word + " "},
+                            "finish_reason": None,
+                        }
+                    ],
+                }
+                yield f"data: {json.dumps(chunk)}\n\n"
+                time.sleep(0.05)  # Simulate streaming delay
+
+            # Send final chunk
+            chunk = {
+                "id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
+                "object": "chat.completion.chunk",
+                "created": int(time.time()),
+                "model": model,
+                "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}],
+            }
+            yield f"data: {json.dumps(chunk)}\n\n"
+            yield "data: [DONE]\n\n"
+
+        return Response(generate(), mimetype="text/event-stream")
+    else:
+        # Non-streaming response
+        return jsonify(
+            {
+                "id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
+                "object": "chat.completion",
+                "created": int(time.time()),
+                "model": model,
+                "choices": [
+                    {
+                        "index": 0,
+                        "message": {"role": "assistant", "content": response_content},
+                        "finish_reason": "stop",
+                    }
+                ],
+                "usage": {
+                    "prompt_tokens": len(str(messages)),
+                    "completion_tokens": len(response_content.split()),
+                    "total_tokens": len(str(messages)) + len(response_content.split()),
+                },
+            }
+        )
+
+
+@app.route("/v1/completions", methods=["POST"])
+def completions() -> Any:
+    """Handle text completions."""
+    data = request.json or {}
+    model = data.get("model", "gpt-3.5-turbo-instruct")
+    prompt = data.get("prompt", "")
+
+    response_text = f"Mock completion for prompt: {prompt[:100]}..."
+
+    return jsonify(
+        {
+            "id": f"cmpl-{uuid.uuid4().hex[:8]}",
+            "object": "text_completion",
+            "created": int(time.time()),
+            "model": model,
+            "choices": [
+                {
+                    "text": response_text,
+                    "index": 0,
+                    "logprobs": None,
+                    "finish_reason": "stop",
+                }
+            ],
+            "usage": {
+                "prompt_tokens": len(prompt.split()),
+                "completion_tokens": len(response_text.split()),
+                "total_tokens": len(prompt.split()) + len(response_text.split()),
+            },
+        }
+    )
+
+
+@app.route("/v1/embeddings", methods=["POST"])
+def embeddings() -> Any:
+    """Handle embeddings requests."""
+    data = request.json or {}
+    model = data.get("model", "text-embedding-ada-002")
+    input_text = data.get("input", "")
+
+    # Generate mock embedding (1536 dimensions for ada-002)
+    mock_embedding = [0.1] * 1536
+
+    return jsonify(
+        {
+            "object": "list",
+            "data": [{"object": "embedding", "embedding": mock_embedding, "index": 0}],
+            "model": model,
+            "usage": {
+                "prompt_tokens": len(input_text.split()),
+                "total_tokens": len(input_text.split()),
+            },
+        }
+    )
+
+
+@app.route("/v1/models/<model_id>", methods=["GET"])
+def get_model(model_id: str) -> tuple[Any, int] | Any:
+    """Get specific model details."""
+    for model in MODELS:
+        if model["id"] == model_id:
+            return jsonify(model)
+
+    return jsonify({"error": "Model not found"}), 404
+
+
+@app.route("/health", methods=["GET"])
+def health() -> Any:
+    """Health check endpoint."""
+    return jsonify({"status": "healthy"})
+
+
+if __name__ == "__main__":
+    print("🚀 Starting Mock OpenAI Server on http://localhost:5004")
+    print("Available endpoints:")
+    print("  - GET  /v1/models")
+    print("  - POST /v1/chat/completions")
+    print("  - POST /v1/completions")
+    print("  - POST /v1/embeddings")
+    print("  - GET  /v1/models/<model_id>")
+    print("  - GET  /health")
+    app.run(host="0.0.0.0", port=5004, debug=True)
--- a/dify/scripts/stress-test/setup/publish_workflow.py
+++ b/dify/scripts/stress-test/setup/publish_workflow.py
@@ -0,0 +1,105 @@
+#!/usr/bin/env python3
+
+import sys
+from pathlib import Path
+
+sys.path.append(str(Path(__file__).parent.parent))
+
+import json
+
+import httpx
+from common import Logger, config_helper
+
+
+def publish_workflow() -> None:
+    """Publish the imported workflow app."""
+
+    log = Logger("PublishWorkflow")
+    log.header("Publishing Workflow")
+
+    # Read token from config
+    access_token = config_helper.get_token()
+    if not access_token:
+        log.error("No access token found in config")
+        return
+
+    # Read app_id from config
+    app_id = config_helper.get_app_id()
+    if not app_id:
+        log.error("No app_id found in config")
+        return
+
+    log.step(f"Publishing workflow for app: {app_id}")
+
+    # API endpoint for publishing workflow
+    base_url = "http://localhost:5001"
+    publish_endpoint = f"{base_url}/console/api/apps/{app_id}/workflows/publish"
+
+    # Publish payload
+    publish_payload = {"marked_name": "", "marked_comment": ""}
+
+    headers = {
+        "Accept": "*/*",
+        "Accept-Language": "en-US,en;q=0.9",
+        "Cache-Control": "no-cache",
+        "Connection": "keep-alive",
+        "DNT": "1",
+        "Origin": "http://localhost:3000",
+        "Pragma": "no-cache",
+        "Referer": "http://localhost:3000/",
+        "Sec-Fetch-Dest": "empty",
+        "Sec-Fetch-Mode": "cors",
+        "Sec-Fetch-Site": "same-site",
+        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36",
+        "authorization": f"Bearer {access_token}",
+        "content-type": "application/json",
+        "sec-ch-ua": '"Not;A=Brand";v="99", "Google Chrome";v="139", "Chromium";v="139"',
+        "sec-ch-ua-mobile": "?0",
+        "sec-ch-ua-platform": '"macOS"',
+    }
+
+    cookies = {"locale": "en-US"}
+
+    try:
+        # Make the publish request
+        with httpx.Client() as client:
+            response = client.post(
+                publish_endpoint,
+                json=publish_payload,
+                headers=headers,
+                cookies=cookies,
+            )
+
+            if response.status_code == 200 or response.status_code == 201:
+                log.success("Workflow published successfully!")
+                log.key_value("App ID", app_id)
+
+                # Try to parse response if it has JSON content
+                if response.text:
+                    try:
+                        response_data = response.json()
+                        if response_data:
+                            log.debug(f"Response: {json.dumps(response_data, indent=2)}")
+                    except json.JSONDecodeError:
+                        # Response might be empty or non-JSON
+                        pass
+
+            elif response.status_code == 401:
+                log.error("Workflow publish failed: Unauthorized")
+                log.info("Token may have expired. Please run login_admin.py again")
+            elif response.status_code == 404:
+                log.error("Workflow publish failed: App not found")
+                log.info("Make sure the app was imported successfully")
+            else:
+                log.error(f"Workflow publish failed with status code: {response.status_code}")
+                log.debug(f"Response: {response.text}")
+
+    except httpx.ConnectError:
+        log.error("Could not connect to Dify API at http://localhost:5001")
+        log.info("Make sure the API server is running with: ./dev/start-api")
+    except Exception as e:
+        log.error(f"An error occurred: {e}")
+
+
+if __name__ == "__main__":
+    publish_workflow()
--- a/dify/scripts/stress-test/setup/run_workflow.py
+++ b/dify/scripts/stress-test/setup/run_workflow.py
@@ -0,0 +1,161 @@
+#!/usr/bin/env python3
+
+import sys
+from pathlib import Path
+
+sys.path.append(str(Path(__file__).parent.parent))
+
+import json
+
+import httpx
+from common import Logger, config_helper
+
+
+def run_workflow(question: str = "fake question", streaming: bool = True) -> None:
+    """Run the workflow app with a question."""
+
+    log = Logger("RunWorkflow")
+    log.header("Running Workflow")
+
+    # Read API key from config
+    api_token = config_helper.get_api_key()
+    if not api_token:
+        log.error("No API token found in config")
+        log.info("Please run create_api_key.py first to create an API key")
+        return
+
+    log.key_value("Question", question)
+    log.key_value("Mode", "Streaming" if streaming else "Blocking")
+    log.separator()
+
+    # API endpoint for running workflow
+    base_url = "http://localhost:5001"
+    run_endpoint = f"{base_url}/v1/workflows/run"
+
+    # Run payload
+    run_payload = {
+        "inputs": {"question": question},
+        "user": "default user",
+        "response_mode": "streaming" if streaming else "blocking",
+    }
+
+    headers = {
+        "Authorization": f"Bearer {api_token}",
+        "Content-Type": "application/json",
+    }
+
+    try:
+        # Make the run request
+        with httpx.Client(timeout=30.0) as client:
+            if streaming:
+                # Handle streaming response
+                with client.stream(
+                    "POST",
+                    run_endpoint,
+                    json=run_payload,
+                    headers=headers,
+                ) as response:
+                    if response.status_code == 200:
+                        log.success("Workflow started successfully!")
+                        log.separator()
+                        log.step("Streaming response:")
+
+                        for line in response.iter_lines():
+                            if line.startswith("data: "):
+                                data_str = line[6:]  # Remove "data: " prefix
+                                if data_str == "[DONE]":
+                                    log.success("Workflow completed!")
+                                    break
+                                try:
+                                    data = json.loads(data_str)
+                                    event = data.get("event")
+
+                                    if event == "workflow_started":
+                                        log.progress(f"Workflow started: {data.get('data', {}).get('id')}")
+                                    elif event == "node_started":
+                                        node_data = data.get("data", {})
+                                        log.progress(
+                                            f"Node started: {node_data.get('node_type')} - {node_data.get('title')}"
+                                        )
+                                    elif event == "node_finished":
+                                        node_data = data.get("data", {})
+                                        log.progress(
+                                            f"Node finished: {node_data.get('node_type')} - {node_data.get('title')}"
+                                        )
+
+                                        # Print output if it's the LLM node
+                                        outputs = node_data.get("outputs", {})
+                                        if outputs.get("text"):
+                                            log.separator()
+                                            log.info("💬 LLM Response:")
+                                            log.info(outputs.get("text"), indent=2)
+                                            log.separator()
+
+                                    elif event == "workflow_finished":
+                                        workflow_data = data.get("data", {})
+                                        outputs = workflow_data.get("outputs", {})
+                                        if outputs.get("answer"):
+                                            log.separator()
+                                            log.info("📤 Final Answer:")
+                                            log.info(outputs.get("answer"), indent=2)
+                                        log.separator()
+                                        log.key_value(
+                                            "Total tokens",
+                                            str(workflow_data.get("total_tokens", 0)),
+                                        )
+                                        log.key_value(
+                                            "Total steps",
+                                            str(workflow_data.get("total_steps", 0)),
+                                        )
+
+                                    elif event == "error":
+                                        log.error(f"Error: {data.get('message')}")
+
+                                except json.JSONDecodeError:
+                                    # Some lines might not be JSON
+                                    pass
+                    else:
+                        log.error(f"Workflow run failed with status code: {response.status_code}")
+                        log.debug(f"Response: {response.text}")
+            else:
+                # Handle blocking response
+                response = client.post(
+                    run_endpoint,
+                    json=run_payload,
+                    headers=headers,
+                )
+
+                if response.status_code == 200:
+                    log.success("Workflow completed successfully!")
+                    response_data = response.json()
+
+                    log.separator()
+                    log.debug(f"Full response: {json.dumps(response_data, indent=2)}")
+
+                    # Extract the answer if available
+                    outputs = response_data.get("data", {}).get("outputs", {})
+                    if outputs.get("answer"):
+                        log.separator()
+                        log.info("📤 Final Answer:")
+                        log.info(outputs.get("answer"), indent=2)
+                else:
+                    log.error(f"Workflow run failed with status code: {response.status_code}")
+                    log.debug(f"Response: {response.text}")
+
+    except httpx.ConnectError:
+        log.error("Could not connect to Dify API at http://localhost:5001")
+        log.info("Make sure the API server is running with: ./dev/start-api")
+    except httpx.TimeoutException:
+        log.error("Request timed out")
+    except Exception as e:
+        log.error(f"An error occurred: {e}")
+
+
+if __name__ == "__main__":
+    # Allow passing question as command line argument
+    if len(sys.argv) > 1:
+        question = " ".join(sys.argv[1:])
+    else:
+        question = "What is the capital of France?"
+
+    run_workflow(question=question, streaming=True)
--- a/dify/scripts/stress-test/setup/setup_admin.py
+++ b/dify/scripts/stress-test/setup/setup_admin.py
@@ -0,0 +1,71 @@
+#!/usr/bin/env python3
+
+import sys
+from pathlib import Path
+
+sys.path.append(str(Path(__file__).parent.parent))
+
+import httpx
+from common import Logger, config_helper
+
+
+def setup_admin_account() -> None:
+    """Setup Dify API with an admin account."""
+
+    log = Logger("SetupAdmin")
+    log.header("Setting up Admin Account")
+
+    # Admin account credentials
+    admin_config = {
+        "email": "test@dify.ai",
+        "username": "dify",
+        "password": "password123",
+    }
+
+    # Save credentials to config file
+    if config_helper.write_config("admin_config", admin_config):
+        log.info(f"Admin credentials saved to: {config_helper.get_config_path('benchmark_state')}")
+
+    # API setup endpoint
+    base_url = "http://localhost:5001"
+    setup_endpoint = f"{base_url}/console/api/setup"
+
+    # Prepare setup payload
+    setup_payload = {
+        "email": admin_config["email"],
+        "name": admin_config["username"],
+        "password": admin_config["password"],
+    }
+
+    log.step("Configuring Dify with admin account...")
+
+    try:
+        # Make the setup request
+        with httpx.Client() as client:
+            response = client.post(
+                setup_endpoint,
+                json=setup_payload,
+                headers={"Content-Type": "application/json"},
+            )
+
+            if response.status_code == 201:
+                log.success("Admin account created successfully!")
+                log.key_value("Email", admin_config["email"])
+                log.key_value("Username", admin_config["username"])
+
+            elif response.status_code == 400:
+                log.warning("Setup may have already been completed or invalid data provided")
+                log.debug(f"Response: {response.text}")
+            else:
+                log.error(f"Setup failed with status code: {response.status_code}")
+                log.debug(f"Response: {response.text}")
+
+    except httpx.ConnectError:
+        log.error("Could not connect to Dify API at http://localhost:5001")
+        log.info("Make sure the API server is running with: ./dev/start-api")
+    except Exception as e:
+        log.error(f"An error occurred: {e}")
+
+
+if __name__ == "__main__":
+    setup_admin_account()
--- a/dify/scripts/stress-test/setup_all.py
+++ b/dify/scripts/stress-test/setup_all.py
@@ -0,0 +1,162 @@
+#!/usr/bin/env python3
+
+import socket
+import subprocess
+import sys
+import time
+from pathlib import Path
+
+from common import Logger, ProgressLogger
+
+
+def run_script(script_name: str, description: str) -> bool:
+    """Run a Python script and return success status."""
+    script_path = Path(__file__).parent / "setup" / script_name
+
+    if not script_path.exists():
+        print(f"❌ Script not found: {script_path}")
+        return False
+
+    print(f"\n{'=' * 60}")
+    print(f"🚀 {description}")
+    print(f"   Running: {script_name}")
+    print(f"{'=' * 60}")
+
+    try:
+        result = subprocess.run(
+            [sys.executable, str(script_path)],
+            capture_output=True,
+            text=True,
+            check=False,
+        )
+
+        # Print output
+        if result.stdout:
+            print(result.stdout)
+        if result.stderr:
+            print(result.stderr, file=sys.stderr)
+
+        if result.returncode != 0:
+            print(f"❌ Script failed with exit code: {result.returncode}")
+            return False
+
+        print(f"✅ {script_name} completed successfully")
+        return True
+
+    except Exception as e:
+        print(f"❌ Error running {script_name}: {e}")
+        return False
+
+
+def check_port(host: str, port: int, service_name: str) -> bool:
+    """Check if a service is running on the specified port."""
+    try:
+        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+        sock.settimeout(2)
+        result = sock.connect_ex((host, port))
+        sock.close()
+
+        if result == 0:
+            Logger().success(f"{service_name} is running on port {port}")
+            return True
+        else:
+            Logger().error(f"{service_name} is not accessible on port {port}")
+            return False
+    except Exception as e:
+        Logger().error(f"Error checking {service_name}: {e}")
+        return False
+
+
+def main() -> None:
+    """Run all setup scripts in order."""
+
+    log = Logger("Setup")
+    log.box("Dify Stress Test Setup - Full Installation")
+
+    # Check if required services are running
+    log.step("Checking required services...")
+    log.separator()
+
+    dify_running = check_port("localhost", 5001, "Dify API server")
+    if not dify_running:
+        log.info("To start Dify API server:")
+        log.list_item("Run: ./dev/start-api")
+
+    mock_running = check_port("localhost", 5004, "Mock OpenAI server")
+    if not mock_running:
+        log.info("To start Mock OpenAI server:")
+        log.list_item("Run: python scripts/stress-test/setup/mock_openai_server.py")
+
+    if not dify_running or not mock_running:
+        print("\n⚠️  Both services must be running before proceeding.")
+        retry = input("\nWould you like to check again? (yes/no): ")
+        if retry.lower() in ["yes", "y"]:
+            return main()  # Recursively call main to check again
+        else:
+            print("❌ Setup cancelled. Please start the required services and try again.")
+            sys.exit(1)
+
+    log.success("All required services are running!")
+    input("\nPress Enter to continue with setup...")
+
+    # Define setup steps
+    setup_steps = [
+        ("setup_admin.py", "Creating admin account"),
+        ("login_admin.py", "Logging in and getting access token"),
+        ("install_openai_plugin.py", "Installing OpenAI plugin"),
+        ("configure_openai_plugin.py", "Configuring OpenAI plugin with mock server"),
+        ("import_workflow_app.py", "Importing workflow application"),
+        ("create_api_key.py", "Creating API key for the app"),
+        ("publish_workflow.py", "Publishing the workflow"),
+    ]
+
+    # Create progress logger
+    progress = ProgressLogger(len(setup_steps), log)
+    failed_step = None
+
+    for script, description in setup_steps:
+        progress.next_step(description)
+        success = run_script(script, description)
+
+        if not success:
+            failed_step = script
+            break
+
+        # Small delay between steps
+        time.sleep(1)
+
+    log.separator()
+
+    if failed_step:
+        log.error(f"Setup failed at: {failed_step}")
+        log.separator()
+        log.info("Troubleshooting:")
+        log.list_item("Check if the Dify API server is running (./dev/start-api)")
+        log.list_item("Check if the mock OpenAI server is running (port 5004)")
+        log.list_item("Review the error messages above")
+        log.list_item("Run cleanup.py and try again")
+        sys.exit(1)
+    else:
+        progress.complete()
+        log.separator()
+        log.success("Setup completed successfully!")
+        log.info("Next steps:")
+        log.list_item("Test the workflow:")
+        log.info(
+            '   python scripts/stress-test/setup/run_workflow.py "Your question here"',
+            indent=4,
+        )
+        log.list_item("To clean up and start over:")
+        log.info("   python scripts/stress-test/cleanup.py", indent=4)
+
+        # Optionally run a test
+        log.separator()
+        test_input = input("Would you like to run a test workflow now? (yes/no): ")
+
+        if test_input.lower() in ["yes", "y"]:
+            log.step("Running test workflow...")
+            run_script("run_workflow.py", "Testing workflow with default question")
+
+
+if __name__ == "__main__":
+    main()
--- a/dify/scripts/stress-test/sse_benchmark.py
+++ b/dify/scripts/stress-test/sse_benchmark.py
@@ -0,0 +1,750 @@
+#!/usr/bin/env python3
+"""
+SSE (Server-Sent Events) Stress Test for Dify Workflow API
+
+This script stress tests the streaming performance of Dify's workflow execution API,
+measuring key metrics like connection rate, event throughput, and time to first event (TTFE).
+"""
+
+import json
+import logging
+import os
+import random
+import statistics
+import sys
+import threading
+import time
+from collections import deque
+from dataclasses import asdict, dataclass
+from datetime import datetime
+from pathlib import Path
+from typing import Literal, TypeAlias, TypedDict
+
+import requests.exceptions
+from locust import HttpUser, between, constant, events, task
+
+# Add the stress-test directory to path to import common modules
+sys.path.insert(0, str(Path(__file__).parent))
+from common.config_helper import ConfigHelper  # type: ignore[import-not-found]
+
+# Configure logging
+logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")
+logger = logging.getLogger(__name__)
+
+# Configuration from environment
+WORKFLOW_PATH = os.getenv("WORKFLOW_PATH", "/v1/workflows/run")
+CONNECT_TIMEOUT = float(os.getenv("CONNECT_TIMEOUT", "10"))
+READ_TIMEOUT = float(os.getenv("READ_TIMEOUT", "60"))
+TERMINAL_EVENTS = [e.strip() for e in os.getenv("TERMINAL_EVENTS", "workflow_finished,error").split(",") if e.strip()]
+QUESTIONS_FILE = os.getenv("QUESTIONS_FILE", "")
+
+
+# Type definitions
+ErrorType: TypeAlias = Literal[
+    "connection_error",
+    "timeout",
+    "invalid_json",
+    "http_4xx",
+    "http_5xx",
+    "early_termination",
+    "invalid_response",
+]
+
+
+class ErrorCounts(TypedDict):
+    """Error count tracking"""
+
+    connection_error: int
+    timeout: int
+    invalid_json: int
+    http_4xx: int
+    http_5xx: int
+    early_termination: int
+    invalid_response: int
+
+
+class SSEEvent(TypedDict):
+    """Server-Sent Event structure"""
+
+    data: str
+    event: str
+    id: str | None
+
+
+class WorkflowInputs(TypedDict):
+    """Workflow input structure"""
+
+    question: str
+
+
+class WorkflowRequestData(TypedDict):
+    """Workflow request payload"""
+
+    inputs: WorkflowInputs
+    response_mode: Literal["streaming"]
+    user: str
+
+
+class ParsedEventData(TypedDict, total=False):
+    """Parsed event data from SSE stream"""
+
+    event: str
+    task_id: str
+    workflow_run_id: str
+    data: object  # For dynamic content
+    created_at: int
+
+
+class LocustStats(TypedDict):
+    """Locust statistics structure"""
+
+    total_requests: int
+    total_failures: int
+    avg_response_time: float
+    min_response_time: float
+    max_response_time: float
+
+
+class ReportData(TypedDict):
+    """JSON report structure"""
+
+    timestamp: str
+    duration_seconds: float
+    metrics: dict[str, object]  # Metrics as dict for JSON serialization
+    locust_stats: LocustStats | None
+
+
+@dataclass
+class StreamMetrics:
+    """Metrics for a single stream"""
+
+    stream_duration: float
+    events_count: int
+    bytes_received: int
+    ttfe: float
+    inter_event_times: list[float]
+
+
+@dataclass
+class MetricsSnapshot:
+    """Snapshot of current metrics state"""
+
+    active_connections: int
+    total_connections: int
+    total_events: int
+    connection_rate: float
+    event_rate: float
+    overall_conn_rate: float
+    overall_event_rate: float
+    ttfe_avg: float
+    ttfe_min: float
+    ttfe_max: float
+    ttfe_p50: float
+    ttfe_p95: float
+    ttfe_samples: int
+    ttfe_total_samples: int  # Total TTFE samples collected (not limited by window)
+    error_counts: ErrorCounts
+    stream_duration_avg: float
+    stream_duration_p50: float
+    stream_duration_p95: float
+    events_per_stream_avg: float
+    inter_event_latency_avg: float
+    inter_event_latency_p50: float
+    inter_event_latency_p95: float
+
+
+class MetricsTracker:
+    def __init__(self) -> None:
+        self.lock = threading.Lock()
+        self.active_connections = 0
+        self.total_connections = 0
+        self.total_events = 0
+        self.start_time = time.time()
+
+        # Enhanced metrics with memory limits
+        self.max_samples = 10000  # Prevent unbounded growth
+        self.ttfe_samples: deque[float] = deque(maxlen=self.max_samples)
+        self.ttfe_total_count = 0  # Track total TTFE samples collected
+
+        # For rate calculations - no maxlen to avoid artificial limits
+        self.connection_times: deque[float] = deque()
+        self.event_times: deque[float] = deque()
+        self.last_stats_time = time.time()
+        self.last_total_connections = 0
+        self.last_total_events = 0
+        self.stream_metrics: deque[StreamMetrics] = deque(maxlen=self.max_samples)
+        self.error_counts: ErrorCounts = ErrorCounts(
+            connection_error=0,
+            timeout=0,
+            invalid_json=0,
+            http_4xx=0,
+            http_5xx=0,
+            early_termination=0,
+            invalid_response=0,
+        )
+
+    def connection_started(self) -> None:
+        with self.lock:
+            self.active_connections += 1
+            self.total_connections += 1
+            self.connection_times.append(time.time())
+
+    def connection_ended(self) -> None:
+        with self.lock:
+            self.active_connections -= 1
+
+    def event_received(self) -> None:
+        with self.lock:
+            self.total_events += 1
+            self.event_times.append(time.time())
+
+    def record_ttfe(self, ttfe_ms: float) -> None:
+        with self.lock:
+            self.ttfe_samples.append(ttfe_ms)  # deque handles maxlen
+            self.ttfe_total_count += 1  # Increment total counter
+
+    def record_stream_metrics(self, metrics: StreamMetrics) -> None:
+        with self.lock:
+            self.stream_metrics.append(metrics)  # deque handles maxlen
+
+    def record_error(self, error_type: ErrorType) -> None:
+        with self.lock:
+            self.error_counts[error_type] += 1
+
+    def get_stats(self) -> MetricsSnapshot:
+        with self.lock:
+            current_time = time.time()
+            time_window = 10.0  # 10 second window for rate calculation
+
+            # Clean up old timestamps outside the window
+            cutoff_time = current_time - time_window
+            while self.connection_times and self.connection_times[0] < cutoff_time:
+                self.connection_times.popleft()
+            while self.event_times and self.event_times[0] < cutoff_time:
+                self.event_times.popleft()
+
+            # Calculate rates based on actual window or elapsed time
+            window_duration = min(time_window, current_time - self.start_time)
+            if window_duration > 0:
+                conn_rate = len(self.connection_times) / window_duration
+                event_rate = len(self.event_times) / window_duration
+            else:
+                conn_rate = 0
+                event_rate = 0
+
+            # Calculate TTFE statistics
+            if self.ttfe_samples:
+                avg_ttfe = statistics.mean(self.ttfe_samples)
+                min_ttfe = min(self.ttfe_samples)
+                max_ttfe = max(self.ttfe_samples)
+                p50_ttfe = statistics.median(self.ttfe_samples)
+                if len(self.ttfe_samples) >= 2:
+                    quantiles = statistics.quantiles(self.ttfe_samples, n=20, method="inclusive")
+                    p95_ttfe = quantiles[18]  # 19th of 19 quantiles = 95th percentile
+                else:
+                    p95_ttfe = max_ttfe
+            else:
+                avg_ttfe = min_ttfe = max_ttfe = p50_ttfe = p95_ttfe = 0
+
+            # Calculate stream metrics
+            if self.stream_metrics:
+                durations = [m.stream_duration for m in self.stream_metrics]
+                events_per_stream = [m.events_count for m in self.stream_metrics]
+                stream_duration_avg = statistics.mean(durations)
+                stream_duration_p50 = statistics.median(durations)
+                stream_duration_p95 = (
+                    statistics.quantiles(durations, n=20, method="inclusive")[18]
+                    if len(durations) >= 2
+                    else max(durations)
+                    if durations
+                    else 0
+                )
+                events_per_stream_avg = statistics.mean(events_per_stream) if events_per_stream else 0
+
+                # Calculate inter-event latency statistics
+                all_inter_event_times = []
+                for m in self.stream_metrics:
+                    all_inter_event_times.extend(m.inter_event_times)
+
+                if all_inter_event_times:
+                    inter_event_latency_avg = statistics.mean(all_inter_event_times)
+                    inter_event_latency_p50 = statistics.median(all_inter_event_times)
+                    inter_event_latency_p95 = (
+                        statistics.quantiles(all_inter_event_times, n=20, method="inclusive")[18]
+                        if len(all_inter_event_times) >= 2
+                        else max(all_inter_event_times)
+                    )
+                else:
+                    inter_event_latency_avg = inter_event_latency_p50 = inter_event_latency_p95 = 0
+            else:
+                stream_duration_avg = stream_duration_p50 = stream_duration_p95 = events_per_stream_avg = 0
+                inter_event_latency_avg = inter_event_latency_p50 = inter_event_latency_p95 = 0
+
+            # Also calculate overall average rates
+            total_elapsed = current_time - self.start_time
+            overall_conn_rate = self.total_connections / total_elapsed if total_elapsed > 0 else 0
+            overall_event_rate = self.total_events / total_elapsed if total_elapsed > 0 else 0
+
+            return MetricsSnapshot(
+                active_connections=self.active_connections,
+                total_connections=self.total_connections,
+                total_events=self.total_events,
+                connection_rate=conn_rate,
+                event_rate=event_rate,
+                overall_conn_rate=overall_conn_rate,
+                overall_event_rate=overall_event_rate,
+                ttfe_avg=avg_ttfe,
+                ttfe_min=min_ttfe,
+                ttfe_max=max_ttfe,
+                ttfe_p50=p50_ttfe,
+                ttfe_p95=p95_ttfe,
+                ttfe_samples=len(self.ttfe_samples),
+                ttfe_total_samples=self.ttfe_total_count,  # Return total count
+                error_counts=ErrorCounts(**self.error_counts),
+                stream_duration_avg=stream_duration_avg,
+                stream_duration_p50=stream_duration_p50,
+                stream_duration_p95=stream_duration_p95,
+                events_per_stream_avg=events_per_stream_avg,
+                inter_event_latency_avg=inter_event_latency_avg,
+                inter_event_latency_p50=inter_event_latency_p50,
+                inter_event_latency_p95=inter_event_latency_p95,
+            )
+
+
+# Global metrics instance
+metrics = MetricsTracker()
+
+
+class SSEParser:
+    """Parser for Server-Sent Events according to W3C spec"""
+
+    def __init__(self) -> None:
+        self.data_buffer: list[str] = []
+        self.event_type: str | None = None
+        self.event_id: str | None = None
+
+    def parse_line(self, line: str) -> SSEEvent | None:
+        """Parse a single SSE line and return event if complete"""
+        # Empty line signals end of event
+        if not line:
+            if self.data_buffer:
+                event = SSEEvent(
+                    data="\n".join(self.data_buffer),
+                    event=self.event_type or "message",
+                    id=self.event_id,
+                )
+                self.data_buffer = []
+                self.event_type = None
+                self.event_id = None
+                return event
+            return None
+
+        # Comment line
+        if line.startswith(":"):
+            return None
+
+        # Parse field
+        if ":" in line:
+            field, value = line.split(":", 1)
+            value = value.lstrip()
+
+            if field == "data":
+                self.data_buffer.append(value)
+            elif field == "event":
+                self.event_type = value
+            elif field == "id":
+                self.event_id = value
+
+        return None
+
+
+# Note: SSEClient removed - we'll handle SSE parsing directly in the task for better Locust integration
+
+
+class DifyWorkflowUser(HttpUser):
+    """Locust user for testing Dify workflow SSE endpoints"""
+
+    # Use constant wait for streaming workloads
+    wait_time = constant(0) if os.getenv("WAIT_TIME", "0") == "0" else between(1, 3)
+
+    def __init__(self, *args: object, **kwargs: object) -> None:
+        super().__init__(*args, **kwargs)  # type: ignore[arg-type]
+
+        # Load API configuration
+        config_helper = ConfigHelper()
+        self.api_token = config_helper.get_api_key()
+
+        if not self.api_token:
+            raise ValueError("API key not found. Please run setup_all.py first.")
+
+        # Load questions from file or use defaults
+        if QUESTIONS_FILE and os.path.exists(QUESTIONS_FILE):
+            with open(QUESTIONS_FILE) as f:
+                self.questions = [line.strip() for line in f if line.strip()]
+        else:
+            self.questions = [
+                "What is artificial intelligence?",
+                "Explain quantum computing",
+                "What is machine learning?",
+                "How do neural networks work?",
+                "What is renewable energy?",
+            ]
+
+        self.user_counter = 0
+
+    def on_start(self) -> None:
+        """Called when a user starts"""
+        self.user_counter = 0
+
+    @task
+    def test_workflow_stream(self) -> None:
+        """Test workflow SSE streaming endpoint"""
+
+        question = random.choice(self.questions)
+        self.user_counter += 1
+
+        headers = {
+            "Authorization": f"Bearer {self.api_token}",
+            "Content-Type": "application/json",
+            "Accept": "text/event-stream",
+            "Cache-Control": "no-cache",
+        }
+
+        data = WorkflowRequestData(
+            inputs=WorkflowInputs(question=question),
+            response_mode="streaming",
+            user=f"user_{self.user_counter}",
+        )
+
+        start_time = time.time()
+        first_event_time = None
+        event_count = 0
+        inter_event_times: list[float] = []
+        last_event_time = None
+        ttfe = 0
+        request_success = False
+        bytes_received = 0
+
+        metrics.connection_started()
+
+        # Use catch_response context manager directly
+        with self.client.request(
+            method="POST",
+            url=WORKFLOW_PATH,
+            headers=headers,
+            json=data,
+            stream=True,
+            catch_response=True,
+            timeout=(CONNECT_TIMEOUT, READ_TIMEOUT),
+            name="/v1/workflows/run",  # Name for Locust stats
+        ) as response:
+            try:
+                # Validate response
+                if response.status_code >= 400:
+                    error_type: ErrorType = "http_4xx" if response.status_code < 500 else "http_5xx"
+                    metrics.record_error(error_type)
+                    response.failure(f"HTTP {response.status_code}")
+                    return
+
+                content_type = response.headers.get("Content-Type", "")
+                if "text/event-stream" not in content_type and "application/json" not in content_type:
+                    logger.error(f"Expected text/event-stream, got: {content_type}")
+                    metrics.record_error("invalid_response")
+                    response.failure(f"Invalid content type: {content_type}")
+                    return
+
+                # Parse SSE events
+                parser = SSEParser()
+
+                for line in response.iter_lines(decode_unicode=True):
+                    # Check if runner is stopping
+                    if getattr(self.environment.runner, "state", "") in (
+                        "stopping",
+                        "stopped",
+                    ):
+                        logger.debug("Runner stopping, breaking streaming loop")
+                        break
+
+                    if line is not None:
+                        bytes_received += len(line.encode("utf-8"))
+
+                    # Parse SSE line
+                    event = parser.parse_line(line if line is not None else "")
+                    if event:
+                        event_count += 1
+                        current_time = time.time()
+                        metrics.event_received()
+
+                        # Track inter-event timing
+                        if last_event_time:
+                            inter_event_times.append((current_time - last_event_time) * 1000)
+                        last_event_time = current_time
+
+                        if first_event_time is None:
+                            first_event_time = current_time
+                            ttfe = (first_event_time - start_time) * 1000
+                            metrics.record_ttfe(ttfe)
+
+                        try:
+                            # Parse event data
+                            event_data = event.get("data", "")
+                            if event_data:
+                                if event_data == "[DONE]":
+                                    logger.debug("Received [DONE] sentinel")
+                                    request_success = True
+                                    break
+
+                                try:
+                                    parsed_event: ParsedEventData = json.loads(event_data)
+                                    # Check for terminal events
+                                    if parsed_event.get("event") in TERMINAL_EVENTS:
+                                        logger.debug(f"Received terminal event: {parsed_event.get('event')}")
+                                        request_success = True
+                                        break
+                                except json.JSONDecodeError as e:
+                                    logger.debug(f"JSON decode error: {e} for data: {event_data[:100]}")
+                                    metrics.record_error("invalid_json")
+
+                        except Exception as e:
+                            logger.error(f"Error processing event: {e}")
+
+                # Mark success only if terminal condition was met or events were received
+                if request_success:
+                    response.success()
+                elif event_count > 0:
+                    # Got events but no proper terminal condition
+                    metrics.record_error("early_termination")
+                    response.failure("Stream ended without terminal event")
+                else:
+                    response.failure("No events received")
+
+            except (
+                requests.exceptions.ConnectTimeout,
+                requests.exceptions.ReadTimeout,
+            ) as e:
+                metrics.record_error("timeout")
+                response.failure(f"Timeout: {e}")
+            except (
+                requests.exceptions.ConnectionError,
+                requests.exceptions.RequestException,
+            ) as e:
+                metrics.record_error("connection_error")
+                response.failure(f"Connection error: {e}")
+            except Exception as e:
+                response.failure(str(e))
+                raise
+            finally:
+                metrics.connection_ended()
+
+                # Record stream metrics
+                if event_count > 0:
+                    stream_duration = (time.time() - start_time) * 1000
+                    stream_metrics = StreamMetrics(
+                        stream_duration=stream_duration,
+                        events_count=event_count,
+                        bytes_received=bytes_received,
+                        ttfe=ttfe,
+                        inter_event_times=inter_event_times,
+                    )
+                    metrics.record_stream_metrics(stream_metrics)
+                    logger.debug(
+                        f"Stream completed: {event_count} events, {stream_duration:.1f}ms, success={request_success}"
+                    )
+                else:
+                    logger.warning("No events received in stream")
+
+
+# Event handlers
+@events.test_start.add_listener  # type: ignore[misc]
+def on_test_start(environment: object, **kwargs: object) -> None:
+    logger.info("=" * 80)
+    logger.info(" " * 25 + "DIFY SSE BENCHMARK - REAL-TIME METRICS")
+    logger.info("=" * 80)
+    logger.info(f"Started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+    logger.info("=" * 80)
+
+    # Periodic stats reporting
+    def report_stats() -> None:
+        if not hasattr(environment, "runner"):
+            return
+        runner = environment.runner
+        while hasattr(runner, "state") and runner.state not in ["stopped", "stopping"]:
+            time.sleep(5)  # Report every 5 seconds
+            if hasattr(runner, "state") and runner.state == "running":
+                stats = metrics.get_stats()
+
+                # Only log on master node in distributed mode
+                is_master = (
+                    not getattr(environment.runner, "worker_id", None) if hasattr(environment, "runner") else True
+                )
+                if is_master:
+                    # Clear previous lines and show updated stats
+                    logger.info("\n" + "=" * 80)
+                    logger.info(
+                        f"{'METRIC':<25} {'CURRENT':>15} {'RATE (10s)':>15} {'AVG (overall)':>15} {'TOTAL':>12}"
+                    )
+                    logger.info("-" * 80)
+
+                    # Active SSE Connections
+                    logger.info(
+                        f"{'Active SSE Connections':<25} {stats.active_connections:>15,d} {'-':>15} {'-':>12} {'-':>12}"
+                    )
+
+                    # New Connection Rate
+                    logger.info(
+                        f"{'New Connections':<25} {'-':>15} {stats.connection_rate:>13.2f}/s {stats.overall_conn_rate:>13.2f}/s {stats.total_connections:>12,d}"
+                    )
+
+                    # Event Throughput
+                    logger.info(
+                        f"{'Event Throughput':<25} {'-':>15} {stats.event_rate:>13.2f}/s {stats.overall_event_rate:>13.2f}/s {stats.total_events:>12,d}"
+                    )
+
+                    logger.info("-" * 80)
+                    logger.info(
+                        f"{'TIME TO FIRST EVENT':<25} {'AVG':>15} {'P50':>10} {'P95':>10} {'MIN':>10} {'MAX':>10}"
+                    )
+                    logger.info(
+                        f"{'(TTFE in ms)':<25} {stats.ttfe_avg:>15.1f} {stats.ttfe_p50:>10.1f} {stats.ttfe_p95:>10.1f} {stats.ttfe_min:>10.1f} {stats.ttfe_max:>10.1f}"
+                    )
+                    logger.info(
+                        f"{'Window Samples':<25} {stats.ttfe_samples:>15,d} (last {min(10000, stats.ttfe_total_samples):,d} samples)"
+                    )
+                    logger.info(f"{'Total Samples':<25} {stats.ttfe_total_samples:>15,d}")
+
+                    # Inter-event latency
+                    if stats.inter_event_latency_avg > 0:
+                        logger.info("-" * 80)
+                        logger.info(f"{'INTER-EVENT LATENCY':<25} {'AVG':>15} {'P50':>10} {'P95':>10}")
+                        logger.info(
+                            f"{'(ms between events)':<25} {stats.inter_event_latency_avg:>15.1f} {stats.inter_event_latency_p50:>10.1f} {stats.inter_event_latency_p95:>10.1f}"
+                        )
+
+                    # Error stats
+                    if any(stats.error_counts.values()):
+                        logger.info("-" * 80)
+                        logger.info(f"{'ERROR TYPE':<25} {'COUNT':>15}")
+                        for error_type, count in stats.error_counts.items():
+                            if isinstance(count, int) and count > 0:
+                                logger.info(f"{error_type:<25} {count:>15,d}")
+
+                    logger.info("=" * 80)
+
+                    # Show Locust stats summary
+                    if hasattr(environment, "stats") and hasattr(environment.stats, "total"):
+                        total = environment.stats.total
+                        if hasattr(total, "num_requests") and total.num_requests > 0:
+                            logger.info(
+                                f"{'LOCUST STATS':<25} {'Requests':>12} {'Fails':>8} {'Avg (ms)':>12} {'Min':>8} {'Max':>8}"
+                            )
+                            logger.info("-" * 80)
+                            logger.info(
+                                f"{'Aggregated':<25} {total.num_requests:>12,d} "
+                                f"{total.num_failures:>8,d} "
+                                f"{total.avg_response_time:>12.1f} "
+                                f"{total.min_response_time:>8.0f} "
+                                f"{total.max_response_time:>8.0f}"
+                            )
+                    logger.info("=" * 80)
+
+    threading.Thread(target=report_stats, daemon=True).start()
+
+
+@events.test_stop.add_listener  # type: ignore[misc]
+def on_test_stop(environment: object, **kwargs: object) -> None:
+    stats = metrics.get_stats()
+    test_duration = time.time() - metrics.start_time
+
+    # Log final results
+    logger.info("\n" + "=" * 80)
+    logger.info(" " * 30 + "FINAL BENCHMARK RESULTS")
+    logger.info("=" * 80)
+    logger.info(f"Test Duration: {test_duration:.1f} seconds")
+    logger.info("-" * 80)
+
+    logger.info("")
+    logger.info("CONNECTIONS")
+    logger.info(f"  {'Total Connections:':<30} {stats.total_connections:>10,d}")
+    logger.info(f"  {'Final Active:':<30} {stats.active_connections:>10,d}")
+    logger.info(f"  {'Average Rate:':<30} {stats.overall_conn_rate:>10.2f} conn/s")
+
+    logger.info("")
+    logger.info("EVENTS")
+    logger.info(f"  {'Total Events Received:':<30} {stats.total_events:>10,d}")
+    logger.info(f"  {'Average Throughput:':<30} {stats.overall_event_rate:>10.2f} events/s")
+    logger.info(f"  {'Final Rate (10s window):':<30} {stats.event_rate:>10.2f} events/s")
+
+    logger.info("")
+    logger.info("STREAM METRICS")
+    logger.info(f"  {'Avg Stream Duration:':<30} {stats.stream_duration_avg:>10.1f} ms")
+    logger.info(f"  {'P50 Stream Duration:':<30} {stats.stream_duration_p50:>10.1f} ms")
+    logger.info(f"  {'P95 Stream Duration:':<30} {stats.stream_duration_p95:>10.1f} ms")
+    logger.info(f"  {'Avg Events per Stream:':<30} {stats.events_per_stream_avg:>10.1f}")
+
+    logger.info("")
+    logger.info("INTER-EVENT LATENCY")
+    logger.info(f"  {'Average:':<30} {stats.inter_event_latency_avg:>10.1f} ms")
+    logger.info(f"  {'Median (P50):':<30} {stats.inter_event_latency_p50:>10.1f} ms")
+    logger.info(f"  {'95th Percentile:':<30} {stats.inter_event_latency_p95:>10.1f} ms")
+
+    logger.info("")
+    logger.info("TIME TO FIRST EVENT (ms)")
+    logger.info(f"  {'Average:':<30} {stats.ttfe_avg:>10.1f} ms")
+    logger.info(f"  {'Median (P50):':<30} {stats.ttfe_p50:>10.1f} ms")
+    logger.info(f"  {'95th Percentile:':<30} {stats.ttfe_p95:>10.1f} ms")
+    logger.info(f"  {'Minimum:':<30} {stats.ttfe_min:>10.1f} ms")
+    logger.info(f"  {'Maximum:':<30} {stats.ttfe_max:>10.1f} ms")
+    logger.info(
+        f"  {'Window Samples:':<30} {stats.ttfe_samples:>10,d} (last {min(10000, stats.ttfe_total_samples):,d})"
+    )
+    logger.info(f"  {'Total Samples:':<30} {stats.ttfe_total_samples:>10,d}")
+
+    # Error summary
+    if any(stats.error_counts.values()):
+        logger.info("")
+        logger.info("ERRORS")
+        for error_type, count in stats.error_counts.items():
+            if isinstance(count, int) and count > 0:
+                logger.info(f"  {error_type:<30} {count:>10,d}")
+
+    logger.info("=" * 80 + "\n")
+
+    # Export machine-readable report (only on master node)
+    is_master = not getattr(environment.runner, "worker_id", None) if hasattr(environment, "runner") else True
+    if is_master:
+        export_json_report(stats, test_duration, environment)
+
+
+def export_json_report(stats: MetricsSnapshot, duration: float, environment: object) -> None:
+    """Export metrics to JSON file for CI/CD analysis"""
+
+    reports_dir = Path(__file__).parent / "reports"
+    reports_dir.mkdir(exist_ok=True)
+
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+    report_file = reports_dir / f"sse_metrics_{timestamp}.json"
+
+    # Access environment.stats.total attributes safely
+    locust_stats: LocustStats | None = None
+    if hasattr(environment, "stats") and hasattr(environment.stats, "total"):
+        total = environment.stats.total
+        if hasattr(total, "num_requests") and total.num_requests > 0:
+            locust_stats = LocustStats(
+                total_requests=total.num_requests,
+                total_failures=total.num_failures,
+                avg_response_time=total.avg_response_time,
+                min_response_time=total.min_response_time,
+                max_response_time=total.max_response_time,
+            )
+
+    report_data = ReportData(
+        timestamp=datetime.now().isoformat(),
+        duration_seconds=duration,
+        metrics=asdict(stats),  # type: ignore[arg-type]
+        locust_stats=locust_stats,
+    )
+
+    with open(report_file, "w") as f:
+        json.dump(report_data, f, indent=2)
+
+    logger.info(f"Exported metrics to {report_file}")