dify

2025-12-01 17:21:38 +08:00
parent 32fee2b8ab
commit fab8c13cb3
7511 changed files with 996300 additions and 0 deletions
--- a/dify/scripts/stress-test/README.md
+++ b/dify/scripts/stress-test/README.md
@@ -0,0 +1,521 @@
+# Dify Stress Test Suite
+
+A high-performance stress test suite for Dify workflow execution using **Locust** - optimized for measuring Server-Sent Events (SSE) streaming performance.
+
+## Key Metrics Tracked
+
+The stress test focuses on four critical SSE performance indicators:
+
+1. **Active SSE Connections** - Real-time count of open SSE connections
+1. **New Connection Rate** - Connections per second (conn/sec)
+1. **Time to First Event (TTFE)** - Latency until first SSE event arrives
+1. **Event Throughput** - Events per second (events/sec)
+
+## Features
+
+- **True SSE Support**: Properly handles Server-Sent Events streaming without premature connection closure
+- **Real-time Metrics**: Live reporting every 5 seconds during tests
+- **Comprehensive Tracking**:
+  - Active connection monitoring
+  - Connection establishment rate
+  - Event processing throughput
+  - TTFE distribution analysis
+- **Multiple Interfaces**:
+  - Web UI for real-time monitoring (<http://localhost:8089>)
+  - Headless mode with periodic console updates
+- **Detailed Reports**: Final statistics with overall rates and averages
+- **Easy Configuration**: Uses existing API key configuration from setup
+
+## What Gets Measured
+
+The stress test focuses on SSE streaming performance with these key metrics:
+
+### Primary Endpoint: `/v1/workflows/run`
+
+The stress test tests a single endpoint with comprehensive SSE metrics tracking:
+
+- **Request Type**: POST request to workflow execution API
+- **Response Type**: Server-Sent Events (SSE) stream
+- **Payload**: Random questions from a configurable pool
+- **Concurrency**: Configurable from 1 to 1000+ simultaneous users
+
+### Key Performance Metrics
+
+#### 1. **Active Connections**
+
+- **What it measures**: Number of concurrent SSE connections open at any moment
+- **Why it matters**: Shows system's ability to handle parallel streams
+- **Good values**: Should remain stable under load without drops
+
+#### 2. **Connection Rate (conn/sec)**
+
+- **What it measures**: How fast new SSE connections are established
+- **Why it matters**: Indicates system's ability to handle connection spikes
+- **Good values**:
+  - Light load: 5-10 conn/sec
+  - Medium load: 20-50 conn/sec
+  - Heavy load: 100+ conn/sec
+
+#### 3. **Time to First Event (TTFE)**
+
+- **What it measures**: Latency from request sent to first SSE event received
+- **Why it matters**: Critical for user experience - faster TTFE = better perceived performance
+- **Good values**:
+  - Excellent: < 50ms
+  - Good: 50-100ms
+  - Acceptable: 100-500ms
+  - Poor: > 500ms
+
+#### 4. **Event Throughput (events/sec)**
+
+- **What it measures**: Rate of SSE events being delivered across all connections
+- **Why it matters**: Shows actual data delivery performance
+- **Expected values**: Depends on workflow complexity and number of connections
+  - Single connection: 10-20 events/sec
+  - 10 connections: 50-100 events/sec
+  - 100 connections: 200-500 events/sec
+
+#### 5. **Request/Response Times**
+
+- **P50 (Median)**: 50% of requests complete within this time
+- **P95**: 95% of requests complete within this time
+- **P99**: 99% of requests complete within this time
+- **Min/Max**: Best and worst case response times
+
+## Prerequisites
+
+1. **Dependencies are automatically installed** when running setup:
+
+   - Locust (load testing framework)
+   - sseclient-py (SSE client library)
+
+1. **Complete Dify setup**:
+
+   ```bash
+   # Run the complete setup
+   python scripts/stress-test/setup_all.py
+   ```
+
+1. **Ensure services are running**:
+
+   **IMPORTANT**: For accurate stress testing, run the API server with Gunicorn in production mode:
+
+   ```bash
+   # Run from the api directory
+   cd api
+   uv run gunicorn \
+     --bind 0.0.0.0:5001 \
+     --workers 4 \
+     --worker-class gevent \
+     --timeout 120 \
+     --keep-alive 5 \
+     --log-level info \
+     --access-logfile - \
+     --error-logfile - \
+     app:app
+   ```
+
+   **Configuration options explained**:
+
+   - `--workers 4`: Number of worker processes (adjust based on CPU cores)
+   - `--worker-class gevent`: Async worker for handling concurrent connections
+   - `--timeout 120`: Worker timeout for long-running requests
+   - `--keep-alive 5`: Keep connections alive for SSE streaming
+
+   **NOT RECOMMENDED for stress testing**:
+
+   ```bash
+   # Debug mode - DO NOT use for stress testing (slow performance)
+   ./dev/start-api  # This runs Flask in debug mode with single-threaded execution
+   ```
+
+   **Also start the Mock OpenAI server**:
+
+   ```bash
+   python scripts/stress-test/setup/mock_openai_server.py
+   ```
+
+## Running the Stress Test
+
+```bash
+# Run with default configuration (headless mode)
+./scripts/stress-test/run_locust_stress_test.sh
+
+# Or run directly with uv
+uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001
+
+# Run with Web UI (access at http://localhost:8089)
+uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001 --web-port 8089
+```
+
+The script will:
+
+1. Validate that all required services are running
+1. Check API token availability
+1. Execute the Locust stress test with SSE support
+1. Generate comprehensive reports in the `reports/` directory
+
+## Configuration
+
+The stress test configuration is in `locust.conf`:
+
+```ini
+users = 10           # Number of concurrent users
+spawn-rate = 2       # Users spawned per second
+run-time = 1m        # Test duration (30s, 5m, 1h)
+headless = true      # Run without web UI
+```
+
+### Custom Question Sets
+
+Modify the questions list in `sse_benchmark.py`:
+
+```python
+self.questions = [
+    "Your custom question 1",
+    "Your custom question 2",
+    # Add more questions...
+]
+```
+
+## Understanding the Results
+
+### Report Structure
+
+After running the stress test, you'll find these files in the `reports/` directory:
+
+- `locust_summary_YYYYMMDD_HHMMSS.txt` - Complete console output with metrics
+- `locust_report_YYYYMMDD_HHMMSS.html` - Interactive HTML report with charts
+- `locust_YYYYMMDD_HHMMSS_stats.csv` - CSV with detailed statistics
+- `locust_YYYYMMDD_HHMMSS_stats_history.csv` - Time-series data
+
+### Key Metrics
+
+**Requests Per Second (RPS)**:
+
+- **Excellent**: > 50 RPS
+- **Good**: 20-50 RPS
+- **Acceptable**: 10-20 RPS
+- **Needs Improvement**: < 10 RPS
+
+**Response Time Percentiles**:
+
+- **P50 (Median)**: 50% of requests complete within this time
+- **P95**: 95% of requests complete within this time
+- **P99**: 99% of requests complete within this time
+
+**Success Rate**:
+
+- Should be > 99% for production readiness
+- Lower rates indicate errors or timeouts
+
+### Example Output
+
+```text
+============================================================
+DIFY SSE STRESS TEST
+============================================================
+
+[2025-09-12 15:45:44,468] Starting test run with 10 users at 2 users/sec
+
+============================================================
+SSE Metrics | Active:   8 | Total Conn:   142 | Events:   2841
+Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
+============================================================
+
+Type     Name                          # reqs  # fails |    Avg     Min     Max    Med | req/s  failures/s
+---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
+POST     /v1/workflows/run                  142   0(0.00%) |     41      18     192     38 |   2.37        0.00
+---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
+         Aggregated                         142   0(0.00%) |     41      18     192     38 |   2.37        0.00
+
+============================================================
+FINAL RESULTS
+============================================================
+Total Connections: 142
+Total Events:      2841
+Average TTFE:      43 ms
+============================================================
+```
+
+### How to Read the Results
+
+**Live SSE Metrics Box (Updates every 10 seconds):**
+
+```text
+SSE Metrics | Active:   8 | Total Conn:   142 | Events:   2841
+Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
+```
+
+- **Active**: Current number of open SSE connections
+- **Total Conn**: Cumulative connections established
+- **Events**: Total SSE events received
+- **conn/s**: Connection establishment rate
+- **events/s**: Event delivery rate
+- **TTFE**: Average time to first event
+
+**Standard Locust Table:**
+
+```text
+Type     Name                # reqs  # fails |    Avg     Min     Max    Med | req/s
+POST     /v1/workflows/run      142   0(0.00%) |     41      18     192     38 |   2.37
+```
+
+- **Type**: Always POST for our SSE requests
+- **Name**: The API endpoint being tested
+- **# reqs**: Total requests made
+- **# fails**: Failed requests (should be 0)
+- **Avg/Min/Max/Med**: Response time percentiles (ms)
+- **req/s**: Request throughput
+
+**Performance Targets:**
+
+✅ **Good Performance**:
+
+- Zero failures (0.00%)
+- TTFE < 100ms
+- Stable active connections
+- Consistent event throughput
+
+⚠️ **Warning Signs**:
+
+- Failures > 1%
+- TTFE > 500ms
+- Dropping active connections
+- Declining event rate over time
+
+## Test Scenarios
+
+### Light Load
+
+```yaml
+concurrency: 10
+iterations: 100
+```
+
+### Normal Load
+
+```yaml
+concurrency: 100
+iterations: 1000
+```
+
+### Heavy Load
+
+```yaml
+concurrency: 500
+iterations: 5000
+```
+
+### Stress Test
+
+```yaml
+concurrency: 1000
+iterations: 10000
+```
+
+## Performance Tuning
+
+### API Server Optimization
+
+**Gunicorn Tuning for Different Load Levels**:
+
+```bash
+# Light load (10-50 concurrent users)
+uv run gunicorn --bind 0.0.0.0:5001 --workers 2 --worker-class gevent app:app
+
+# Medium load (50-200 concurrent users)
+uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent --worker-connections 1000 app:app
+
+# Heavy load (200-1000 concurrent users)
+uv run gunicorn --bind 0.0.0.0:5001 --workers 8 --worker-class gevent --worker-connections 2000 --max-requests 1000 app:app
+```
+
+**Worker calculation formula**:
+
+- Workers = (2 × CPU cores) + 1
+- For SSE/WebSocket: Use gevent worker class
+- For CPU-bound tasks: Use sync workers
+
+### Database Optimization
+
+**PostgreSQL Connection Pool Tuning**:
+
+For high-concurrency stress testing, increase the PostgreSQL max connections in `docker/middleware.env`:
+
+```bash
+# Edit docker/middleware.env
+POSTGRES_MAX_CONNECTIONS=200  # Default is 100
+
+# Recommended values for different load levels:
+# Light load (10-50 users): 100 (default)
+# Medium load (50-200 users): 200
+# Heavy load (200-1000 users): 500
+```
+
+After changing, restart the PostgreSQL container:
+
+```bash
+docker compose -f docker/docker-compose.middleware.yaml down db
+docker compose -f docker/docker-compose.middleware.yaml up -d db
+```
+
+**Note**: Each connection uses ~10MB of RAM. Ensure your database server has sufficient memory:
+
+- 100 connections: ~1GB RAM
+- 200 connections: ~2GB RAM
+- 500 connections: ~5GB RAM
+
+### System Optimizations
+
+1. **Increase file descriptor limits**:
+
+   ```bash
+   ulimit -n 65536
+   ```
+
+1. **TCP tuning for high concurrency** (Linux):
+
+   ```bash
+   # Increase TCP buffer sizes
+   sudo sysctl -w net.core.rmem_max=134217728
+   sudo sysctl -w net.core.wmem_max=134217728
+
+   # Enable TCP fast open
+   sudo sysctl -w net.ipv4.tcp_fastopen=3
+   ```
+
+1. **macOS specific**:
+
+   ```bash
+   # Increase maximum connections
+   sudo sysctl -w kern.ipc.somaxconn=2048
+   ```
+
+## Troubleshooting
+
+### Common Issues
+
+1. **"ModuleNotFoundError: No module named 'locust'"**:
+
+   ```bash
+   # Dependencies are installed automatically, but if needed:
+   uv --project api add --dev locust sseclient-py
+   ```
+
+1. **"API key configuration not found"**:
+
+   ```bash
+   # Run setup
+   python scripts/stress-test/setup_all.py
+   ```
+
+1. **Services not running**:
+
+   ```bash
+   # Start Dify API with Gunicorn (production mode)
+   cd api
+   uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent app:app
+
+   # Start Mock OpenAI server
+   python scripts/stress-test/setup/mock_openai_server.py
+   ```
+
+1. **High error rate**:
+
+   - Reduce concurrency level
+   - Check system resources (CPU, memory)
+   - Review API server logs for errors
+   - Increase timeout values if needed
+
+1. **Permission denied running script**:
+
+   ```bash
+   chmod +x run_benchmark.sh
+   ```
+
+## Advanced Usage
+
+### Running Multiple Iterations
+
+```bash
+# Run stress test 3 times with 60-second intervals
+for i in {1..3}; do
+    echo "Run $i of 3"
+    ./run_locust_stress_test.sh
+    sleep 60
+done
+```
+
+### Custom Locust Options
+
+Run Locust directly with custom options:
+
+```bash
+# With specific user count and spawn rate
+uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
+  --host http://localhost:5001 --users 50 --spawn-rate 5
+
+# Generate CSV reports
+uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
+  --host http://localhost:5001 --csv reports/results
+
+# Run for specific duration
+uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
+  --host http://localhost:5001 --run-time 5m --headless
+```
+
+### Comparing Results
+
+```bash
+# Compare multiple stress test runs
+ls -la reports/stress_test_*.txt | tail -5
+```
+
+## Interpreting Performance Issues
+
+### High Response Times
+
+Possible causes:
+
+- Database query performance
+- External API latency
+- Insufficient server resources
+- Network congestion
+
+### Low Throughput (RPS < 10)
+
+Check for:
+
+- CPU bottlenecks
+- Memory constraints
+- Database connection pooling
+- API rate limiting
+
+### High Error Rate
+
+Investigate:
+
+- Server error logs
+- Resource exhaustion
+- Timeout configurations
+- Connection limits
+
+## Why Locust?
+
+Locust was chosen over Drill for this stress test because:
+
+1. **Proper SSE Support**: Correctly handles streaming responses without premature closure
+1. **Custom Metrics**: Can track SSE-specific metrics like TTFE and stream duration
+1. **Web UI**: Real-time monitoring and control via web interface
+1. **Python Integration**: Seamlessly integrates with existing Python setup code
+1. **Extensibility**: Easy to customize for specific testing scenarios
+
+## Contributing
+
+To improve the stress test suite:
+
+1. Edit `stress_test.yml` for configuration changes
+1. Modify `run_locust_stress_test.sh` for workflow improvements
+1. Update question sets for better coverage
+1. Add new metrics or analysis features