urbanLifeline/dify/scripts/stress-test/README.md

# Dify Stress Test Suite

A high-performance stress test suite for Dify workflow execution using **Locust** - optimized for measuring Server-Sent Events (SSE) streaming performance.

## Key Metrics Tracked

The stress test focuses on four critical SSE performance indicators:

1. **Active SSE Connections** - Real-time count of open SSE connections
1. **New Connection Rate** - Connections per second (conn/sec)
1. **Time to First Event (TTFE)** - Latency until first SSE event arrives
1. **Event Throughput** - Events per second (events/sec)

## Features

- **True SSE Support**: Properly handles Server-Sent Events streaming without premature connection closure
- **Real-time Metrics**: Live reporting every 5 seconds during tests
- **Comprehensive Tracking**:
  - Active connection monitoring
  - Connection establishment rate
  - Event processing throughput
  - TTFE distribution analysis
- **Multiple Interfaces**:
  - Web UI for real-time monitoring (<http://localhost:8089>)
  - Headless mode with periodic console updates
- **Detailed Reports**: Final statistics with overall rates and averages
- **Easy Configuration**: Uses existing API key configuration from setup

## What Gets Measured

The stress test focuses on SSE streaming performance with these key metrics:

### Primary Endpoint: `/v1/workflows/run`

The stress test tests a single endpoint with comprehensive SSE metrics tracking:

- **Request Type**: POST request to workflow execution API
- **Response Type**: Server-Sent Events (SSE) stream
- **Payload**: Random questions from a configurable pool
- **Concurrency**: Configurable from 1 to 1000+ simultaneous users

### Key Performance Metrics

#### 1. **Active Connections**

- **What it measures**: Number of concurrent SSE connections open at any moment
- **Why it matters**: Shows system's ability to handle parallel streams
- **Good values**: Should remain stable under load without drops

#### 2. **Connection Rate (conn/sec)**

- **What it measures**: How fast new SSE connections are established
- **Why it matters**: Indicates system's ability to handle connection spikes
- **Good values**:
  - Light load: 5-10 conn/sec
  - Medium load: 20-50 conn/sec
  - Heavy load: 100+ conn/sec

#### 3. **Time to First Event (TTFE)**

- **What it measures**: Latency from request sent to first SSE event received
- **Why it matters**: Critical for user experience - faster TTFE = better perceived performance
- **Good values**:
  - Excellent: < 50ms
  - Good: 50-100ms
  - Acceptable: 100-500ms
  - Poor: > 500ms

#### 4. **Event Throughput (events/sec)**

- **What it measures**: Rate of SSE events being delivered across all connections
- **Why it matters**: Shows actual data delivery performance
- **Expected values**: Depends on workflow complexity and number of connections
  - Single connection: 10-20 events/sec
  - 10 connections: 50-100 events/sec
  - 100 connections: 200-500 events/sec

#### 5. **Request/Response Times**

- **P50 (Median)**: 50% of requests complete within this time
- **P95**: 95% of requests complete within this time
- **P99**: 99% of requests complete within this time
- **Min/Max**: Best and worst case response times

## Prerequisites

1. **Dependencies are automatically installed** when running setup:

   - Locust (load testing framework)
   - sseclient-py (SSE client library)

1. **Complete Dify setup**:

   ```bash
   # Run the complete setup
   python scripts/stress-test/setup_all.py
   ```

1. **Ensure services are running**:

   **IMPORTANT**: For accurate stress testing, run the API server with Gunicorn in production mode:

   ```bash
   # Run from the api directory
   cd api
   uv run gunicorn \
     --bind 0.0.0.0:5001 \
     --workers 4 \
     --worker-class gevent \
     --timeout 120 \
     --keep-alive 5 \
     --log-level info \
     --access-logfile - \
     --error-logfile - \
     app:app
   ```

   **Configuration options explained**:

   - `--workers 4`: Number of worker processes (adjust based on CPU cores)
   - `--worker-class gevent`: Async worker for handling concurrent connections
   - `--timeout 120`: Worker timeout for long-running requests
   - `--keep-alive 5`: Keep connections alive for SSE streaming

   **NOT RECOMMENDED for stress testing**:

   ```bash
   # Debug mode - DO NOT use for stress testing (slow performance)
   ./dev/start-api  # This runs Flask in debug mode with single-threaded execution
   ```

   **Also start the Mock OpenAI server**:

   ```bash
   python scripts/stress-test/setup/mock_openai_server.py
   ```

## Running the Stress Test

```bash
# Run with default configuration (headless mode)
./scripts/stress-test/run_locust_stress_test.sh

# Or run directly with uv
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001

# Run with Web UI (access at http://localhost:8089)
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001 --web-port 8089
```

The script will:

1. Validate that all required services are running
1. Check API token availability
1. Execute the Locust stress test with SSE support
1. Generate comprehensive reports in the `reports/` directory

## Configuration

The stress test configuration is in `locust.conf`:

```ini
users = 10           # Number of concurrent users
spawn-rate = 2       # Users spawned per second
run-time = 1m        # Test duration (30s, 5m, 1h)
headless = true      # Run without web UI
```

### Custom Question Sets

Modify the questions list in `sse_benchmark.py`:

```python
self.questions = [
    "Your custom question 1",
    "Your custom question 2",
    # Add more questions...
]
```

## Understanding the Results

### Report Structure

After running the stress test, you'll find these files in the `reports/` directory:

- `locust_summary_YYYYMMDD_HHMMSS.txt` - Complete console output with metrics
- `locust_report_YYYYMMDD_HHMMSS.html` - Interactive HTML report with charts
- `locust_YYYYMMDD_HHMMSS_stats.csv` - CSV with detailed statistics
- `locust_YYYYMMDD_HHMMSS_stats_history.csv` - Time-series data

### Key Metrics

**Requests Per Second (RPS)**:

- **Excellent**: > 50 RPS
- **Good**: 20-50 RPS
- **Acceptable**: 10-20 RPS
- **Needs Improvement**: < 10 RPS

**Response Time Percentiles**:

- **P50 (Median)**: 50% of requests complete within this time
- **P95**: 95% of requests complete within this time
- **P99**: 99% of requests complete within this time

**Success Rate**:

- Should be > 99% for production readiness
- Lower rates indicate errors or timeouts

### Example Output

```text
============================================================
DIFY SSE STRESS TEST
============================================================

[2025-09-12 15:45:44,468] Starting test run with 10 users at 2 users/sec

============================================================
SSE Metrics | Active:   8 | Total Conn:   142 | Events:   2841
Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
============================================================

Type     Name                          # reqs  # fails |    Avg     Min     Max    Med | req/s  failures/s
---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
POST     /v1/workflows/run                  142   0(0.00%) |     41      18     192     38 |   2.37        0.00
---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
         Aggregated                         142   0(0.00%) |     41      18     192     38 |   2.37        0.00

============================================================
FINAL RESULTS
============================================================
Total Connections: 142
Total Events:      2841
Average TTFE:      43 ms
============================================================
```

### How to Read the Results

**Live SSE Metrics Box (Updates every 10 seconds):**

```text
SSE Metrics | Active:   8 | Total Conn:   142 | Events:   2841
Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
```

- **Active**: Current number of open SSE connections
- **Total Conn**: Cumulative connections established
- **Events**: Total SSE events received
- **conn/s**: Connection establishment rate
- **events/s**: Event delivery rate
- **TTFE**: Average time to first event

**Standard Locust Table:**

```text
Type     Name                # reqs  # fails |    Avg     Min     Max    Med | req/s
POST     /v1/workflows/run      142   0(0.00%) |     41      18     192     38 |   2.37
```

- **Type**: Always POST for our SSE requests
- **Name**: The API endpoint being tested
- **# reqs**: Total requests made
- **# fails**: Failed requests (should be 0)
- **Avg/Min/Max/Med**: Response time percentiles (ms)
- **req/s**: Request throughput

**Performance Targets:**

✅ **Good Performance**:

- Zero failures (0.00%)
- TTFE < 100ms
- Stable active connections
- Consistent event throughput

⚠️ **Warning Signs**:

- Failures > 1%
- TTFE > 500ms
- Dropping active connections
- Declining event rate over time

## Test Scenarios

### Light Load

```yaml
concurrency: 10
iterations: 100
```

### Normal Load

```yaml
concurrency: 100
iterations: 1000
```

### Heavy Load

```yaml
concurrency: 500
iterations: 5000
```

### Stress Test

```yaml
concurrency: 1000
iterations: 10000
```

## Performance Tuning

### API Server Optimization

**Gunicorn Tuning for Different Load Levels**:

```bash
# Light load (10-50 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 2 --worker-class gevent app:app

# Medium load (50-200 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent --worker-connections 1000 app:app

# Heavy load (200-1000 concurrent users)
uv run gunicorn --bind 0.0.0.0:5001 --workers 8 --worker-class gevent --worker-connections 2000 --max-requests 1000 app:app
```

**Worker calculation formula**:

- Workers = (2 × CPU cores) + 1
- For SSE/WebSocket: Use gevent worker class
- For CPU-bound tasks: Use sync workers

### Database Optimization

**PostgreSQL Connection Pool Tuning**:

For high-concurrency stress testing, increase the PostgreSQL max connections in `docker/middleware.env`:

```bash
# Edit docker/middleware.env
POSTGRES_MAX_CONNECTIONS=200  # Default is 100

# Recommended values for different load levels:
# Light load (10-50 users): 100 (default)
# Medium load (50-200 users): 200
# Heavy load (200-1000 users): 500
```

After changing, restart the PostgreSQL container:

```bash
docker compose -f docker/docker-compose.middleware.yaml down db
docker compose -f docker/docker-compose.middleware.yaml up -d db
```

**Note**: Each connection uses ~10MB of RAM. Ensure your database server has sufficient memory:

- 100 connections: ~1GB RAM
- 200 connections: ~2GB RAM
- 500 connections: ~5GB RAM

### System Optimizations

1. **Increase file descriptor limits**:

   ```bash
   ulimit -n 65536
   ```

1. **TCP tuning for high concurrency** (Linux):

   ```bash
   # Increase TCP buffer sizes
   sudo sysctl -w net.core.rmem_max=134217728
   sudo sysctl -w net.core.wmem_max=134217728

   # Enable TCP fast open
   sudo sysctl -w net.ipv4.tcp_fastopen=3
   ```

1. **macOS specific**:

   ```bash
   # Increase maximum connections
   sudo sysctl -w kern.ipc.somaxconn=2048
   ```

## Troubleshooting

### Common Issues

1. **"ModuleNotFoundError: No module named 'locust'"**:

   ```bash
   # Dependencies are installed automatically, but if needed:
   uv --project api add --dev locust sseclient-py
   ```

1. **"API key configuration not found"**:

   ```bash
   # Run setup
   python scripts/stress-test/setup_all.py
   ```

1. **Services not running**:

   ```bash
   # Start Dify API with Gunicorn (production mode)
   cd api
   uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent app:app

   # Start Mock OpenAI server
   python scripts/stress-test/setup/mock_openai_server.py
   ```

1. **High error rate**:

   - Reduce concurrency level
   - Check system resources (CPU, memory)
   - Review API server logs for errors
   - Increase timeout values if needed

1. **Permission denied running script**:

   ```bash
   chmod +x run_benchmark.sh
   ```

## Advanced Usage

### Running Multiple Iterations

```bash
# Run stress test 3 times with 60-second intervals
for i in {1..3}; do
    echo "Run $i of 3"
    ./run_locust_stress_test.sh
    sleep 60
done
```

### Custom Locust Options

Run Locust directly with custom options:

```bash
# With specific user count and spawn rate
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  --host http://localhost:5001 --users 50 --spawn-rate 5

# Generate CSV reports
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  --host http://localhost:5001 --csv reports/results

# Run for specific duration
uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
  --host http://localhost:5001 --run-time 5m --headless
```

### Comparing Results

```bash
# Compare multiple stress test runs
ls -la reports/stress_test_*.txt | tail -5
```

## Interpreting Performance Issues

### High Response Times

Possible causes:

- Database query performance
- External API latency
- Insufficient server resources
- Network congestion

### Low Throughput (RPS < 10)

Check for:

- CPU bottlenecks
- Memory constraints
- Database connection pooling
- API rate limiting

### High Error Rate

Investigate:

- Server error logs
- Resource exhaustion
- Timeout configurations
- Connection limits

## Why Locust?

Locust was chosen over Drill for this stress test because:

1. **Proper SSE Support**: Correctly handles streaming responses without premature closure
1. **Custom Metrics**: Can track SSE-specific metrics like TTFE and stream duration
1. **Web UI**: Real-time monitoring and control via web interface
1. **Python Integration**: Seamlessly integrates with existing Python setup code
1. **Extensibility**: Easy to customize for specific testing scenarios

## Contributing

To improve the stress test suite:

1. Edit `stress_test.yml` for configuration changes
1. Modify `run_locust_stress_test.sh` for workflow improvements
1. Update question sets for better coverage
1. Add new metrics or analysis features
-												dify

											
										
										
											2025-12-01 17:21:38 +08:00
+								# Dify Stress Test Suite
 								A high-performance stress test suite for Dify workflow execution using **Locust** - optimized for measuring Server-Sent Events (SSE) streaming performance.
 								## Key Metrics Tracked
 								The stress test focuses on four critical SSE performance indicators:
 . **Active SSE Connections** - Real-time count of open SSE connections
 . **New Connection Rate** - Connections per second (conn/sec)
 . **Time to First Event (TTFE)** - Latency until first SSE event arrives
 . **Event Throughput** - Events per second (events/sec)
 								## Features
 								- **True SSE Support**: Properly handles Server-Sent Events streaming without premature connection closure
 								- **Real-time Metrics**: Live reporting every 5 seconds during tests
 								- **Comprehensive Tracking**:
 								  - Active connection monitoring
 								  - Connection establishment rate
 								  - Event processing throughput
 								  - TTFE distribution analysis
 								- **Multiple Interfaces**:
 								  - Web UI for real-time monitoring (<http://localhost:8089>)
 								  - Headless mode with periodic console updates
 								- **Detailed Reports**: Final statistics with overall rates and averages
 								- **Easy Configuration**: Uses existing API key configuration from setup
 								## What Gets Measured
 								The stress test focuses on SSE streaming performance with these key metrics:
 								### Primary Endpoint: `/v1/workflows/run`
 								The stress test tests a single endpoint with comprehensive SSE metrics tracking:
 								- **Request Type**: POST request to workflow execution API
 								- **Response Type**: Server-Sent Events (SSE) stream
 								- **Payload**: Random questions from a configurable pool
 								- **Concurrency**: Configurable from 1 to 1000+ simultaneous users
 								### Key Performance Metrics
 								#### 1. **Active Connections**
 								- **What it measures**: Number of concurrent SSE connections open at any moment
 								- **Why it matters**: Shows system's ability to handle parallel streams
 								- **Good values**: Should remain stable under load without drops
 								#### 2. **Connection Rate (conn/sec)**
 								- **What it measures**: How fast new SSE connections are established
 								- **Why it matters**: Indicates system's ability to handle connection spikes
 								- **Good values**:
 								  - Light load: 5-10 conn/sec
 								  - Medium load: 20-50 conn/sec
 								  - Heavy load: 100+ conn/sec
 								#### 3. **Time to First Event (TTFE)**
 								- **What it measures**: Latency from request sent to first SSE event received
 								- **Why it matters**: Critical for user experience - faster TTFE = better perceived performance
 								- **Good values**:
 								  - Excellent: < 50ms
 								  - Good: 50-100ms
 								  - Acceptable: 100-500ms
 								  - Poor: > 500ms
 								#### 4. **Event Throughput (events/sec)**
 								- **What it measures**: Rate of SSE events being delivered across all connections
 								- **Why it matters**: Shows actual data delivery performance
 								- **Expected values**: Depends on workflow complexity and number of connections
 								  - Single connection: 10-20 events/sec
 								  - 10 connections: 50-100 events/sec
 								  - 100 connections: 200-500 events/sec
 								#### 5. **Request/Response Times**
 								- **P50 (Median)**: 50% of requests complete within this time
 								- **P95**: 95% of requests complete within this time
 								- **P99**: 99% of requests complete within this time
 								- **Min/Max**: Best and worst case response times
 								## Prerequisites
 . **Dependencies are automatically installed** when running setup:
 								   - Locust (load testing framework)
 								   - sseclient-py (SSE client library)
 . **Complete Dify setup**:
 								   ```bash
 								   # Run the complete setup
 								   python scripts/stress-test/setup_all.py
 								   ```
 . **Ensure services are running**:
 								   **IMPORTANT**: For accurate stress testing, run the API server with Gunicorn in production mode:
 								   ```bash
 								   # Run from the api directory
 								   cd api
 								   uv run gunicorn \
 								     --bind 0.0.0.0:5001 \
 								     --workers 4 \
 								     --worker-class gevent \
 								     --timeout 120 \
 								     --keep-alive 5 \
 								     --log-level info \
 								     --access-logfile - \
 								     --error-logfile - \
 								     app:app
 								   ```
 								   **Configuration options explained**:
 								   - `--workers 4`: Number of worker processes (adjust based on CPU cores)
 								   - `--worker-class gevent`: Async worker for handling concurrent connections
 								   - `--timeout 120`: Worker timeout for long-running requests
 								   - `--keep-alive 5`: Keep connections alive for SSE streaming
 								   **NOT RECOMMENDED for stress testing**:
 								   ```bash
 								   # Debug mode - DO NOT use for stress testing (slow performance)
 								   ./dev/start-api  # This runs Flask in debug mode with single-threaded execution
 								   ```
 								   **Also start the Mock OpenAI server**:
 								   ```bash
 								   python scripts/stress-test/setup/mock_openai_server.py
 								   ```
 								## Running the Stress Test
 								```bash
 								# Run with default configuration (headless mode)
 								./scripts/stress-test/run_locust_stress_test.sh
 								# Or run directly with uv
 								uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001
 								# Run with Web UI (access at http://localhost:8089)
 								uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py --host http://localhost:5001 --web-port 8089
 								```
 								The script will:
 . Validate that all required services are running
 . Check API token availability
 . Execute the Locust stress test with SSE support
 . Generate comprehensive reports in the `reports/` directory
 								## Configuration
 								The stress test configuration is in `locust.conf`:
 								```ini
 								users = 10           # Number of concurrent users
 								spawn-rate = 2       # Users spawned per second
 								run-time = 1m        # Test duration (30s, 5m, 1h)
 								headless = true      # Run without web UI
 								```
 								### Custom Question Sets
 								Modify the questions list in `sse_benchmark.py`:
 								```python
 								self.questions = [
 								    "Your custom question 1",
 								    "Your custom question 2",
 								    # Add more questions...
 								]
 								```
 								## Understanding the Results
 								### Report Structure
 								After running the stress test, you'll find these files in the `reports/` directory:
 								- `locust_summary_YYYYMMDD_HHMMSS.txt` - Complete console output with metrics
 								- `locust_report_YYYYMMDD_HHMMSS.html` - Interactive HTML report with charts
 								- `locust_YYYYMMDD_HHMMSS_stats.csv` - CSV with detailed statistics
 								- `locust_YYYYMMDD_HHMMSS_stats_history.csv` - Time-series data
 								### Key Metrics
 								**Requests Per Second (RPS)**:
 								- **Excellent**: > 50 RPS
 								- **Good**: 20-50 RPS
 								- **Acceptable**: 10-20 RPS
 								- **Needs Improvement**: < 10 RPS
 								**Response Time Percentiles**:
 								- **P50 (Median)**: 50% of requests complete within this time
 								- **P95**: 95% of requests complete within this time
 								- **P99**: 99% of requests complete within this time
 								**Success Rate**:
 								- Should be > 99% for production readiness
 								- Lower rates indicate errors or timeouts
 								### Example Output
 								```text
 								============================================================
 								DIFY SSE STRESS TEST
 								============================================================
 								[2025-09-12 15:45:44,468] Starting test run with 10 users at 2 users/sec
 								============================================================
 								SSE Metrics | Active:   8 | Total Conn:   142 | Events:   2841
 								Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
 								============================================================
 								Type     Name                          # reqs  # fails |    Avg     Min     Max    Med | req/s  failures/s
 								---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
 								POST     /v1/workflows/run                  142   0(0.00%) |     41      18     192     38 |   2.37        0.00
 								---------|------------------------------|--------|--------|--------|--------|--------|--------|--------|-----------
 								         Aggregated                         142   0(0.00%) |     41      18     192     38 |   2.37        0.00
 								============================================================
 								FINAL RESULTS
 								============================================================
 								Total Connections: 142
 								Total Events:      2841
 								Average TTFE:      43 ms
 								============================================================
 								```
 								### How to Read the Results
 								**Live SSE Metrics Box (Updates every 10 seconds):**
 								```text
 								SSE Metrics | Active:   8 | Total Conn:   142 | Events:   2841
 								Rates: 2.4 conn/s | 47.3 events/s | TTFE: 43ms
 								```
 								- **Active**: Current number of open SSE connections
 								- **Total Conn**: Cumulative connections established
 								- **Events**: Total SSE events received
 								- **conn/s**: Connection establishment rate
 								- **events/s**: Event delivery rate
 								- **TTFE**: Average time to first event
 								**Standard Locust Table:**
 								```text
 								Type     Name                # reqs  # fails |    Avg     Min     Max    Med | req/s
 								POST     /v1/workflows/run      142   0(0.00%) |     41      18     192     38 |   2.37
 								```
 								- **Type**: Always POST for our SSE requests
 								- **Name**: The API endpoint being tested
 								- **# reqs**: Total requests made
 								- **# fails**: Failed requests (should be 0)
 								- **Avg/Min/Max/Med**: Response time percentiles (ms)
 								- **req/s**: Request throughput
 								**Performance Targets:**
 								✅ **Good Performance**:
 								- Zero failures (0.00%)
 								- TTFE < 100ms
 								- Stable active connections
 								- Consistent event throughput
 								⚠️ **Warning Signs**:
 								- Failures > 1%
 								- TTFE > 500ms
 								- Dropping active connections
 								- Declining event rate over time
 								## Test Scenarios
 								### Light Load
 								```yaml
 								concurrency: 10
 								iterations: 100
 								```
 								### Normal Load
 								```yaml
 								concurrency: 100
 								iterations: 1000
 								```
 								### Heavy Load
 								```yaml
 								concurrency: 500
 								iterations: 5000
 								```
 								### Stress Test
 								```yaml
 								concurrency: 1000
 								iterations: 10000
 								```
 								## Performance Tuning
 								### API Server Optimization
 								**Gunicorn Tuning for Different Load Levels**:
 								```bash
 								# Light load (10-50 concurrent users)
 								uv run gunicorn --bind 0.0.0.0:5001 --workers 2 --worker-class gevent app:app
 								# Medium load (50-200 concurrent users)
 								uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent --worker-connections 1000 app:app
 								# Heavy load (200-1000 concurrent users)
 								uv run gunicorn --bind 0.0.0.0:5001 --workers 8 --worker-class gevent --worker-connections 2000 --max-requests 1000 app:app
 								```
 								**Worker calculation formula**:
 								- Workers = (2 × CPU cores) + 1
 								- For SSE/WebSocket: Use gevent worker class
 								- For CPU-bound tasks: Use sync workers
 								### Database Optimization
 								**PostgreSQL Connection Pool Tuning**:
 								For high-concurrency stress testing, increase the PostgreSQL max connections in `docker/middleware.env`:
 								```bash
 								# Edit docker/middleware.env
 								POSTGRES_MAX_CONNECTIONS=200  # Default is 100
 								# Recommended values for different load levels:
 								# Light load (10-50 users): 100 (default)
 								# Medium load (50-200 users): 200
 								# Heavy load (200-1000 users): 500
 								```
 								After changing, restart the PostgreSQL container:
 								```bash
 								docker compose -f docker/docker-compose.middleware.yaml down db
 								docker compose -f docker/docker-compose.middleware.yaml up -d db
 								```
 								**Note**: Each connection uses ~10MB of RAM. Ensure your database server has sufficient memory:
 								- 100 connections: ~1GB RAM
 								- 200 connections: ~2GB RAM
 								- 500 connections: ~5GB RAM
 								### System Optimizations
 . **Increase file descriptor limits**:
 								   ```bash
 								   ulimit -n 65536
 								   ```
 . **TCP tuning for high concurrency** (Linux):
 								   ```bash
 								   # Increase TCP buffer sizes
 								   sudo sysctl -w net.core.rmem_max=134217728
 								   sudo sysctl -w net.core.wmem_max=134217728
 								   # Enable TCP fast open
 								   sudo sysctl -w net.ipv4.tcp_fastopen=3
 								   ```
 . **macOS specific**:
 								   ```bash
 								   # Increase maximum connections
 								   sudo sysctl -w kern.ipc.somaxconn=2048
 								   ```
 								## Troubleshooting
 								### Common Issues
 . **"ModuleNotFoundError: No module named 'locust'"**:
 								   ```bash
 								   # Dependencies are installed automatically, but if needed:
 								   uv --project api add --dev locust sseclient-py
 								   ```
 . **"API key configuration not found"**:
 								   ```bash
 								   # Run setup
 								   python scripts/stress-test/setup_all.py
 								   ```
 . **Services not running**:
 								   ```bash
 								   # Start Dify API with Gunicorn (production mode)
 								   cd api
 								   uv run gunicorn --bind 0.0.0.0:5001 --workers 4 --worker-class gevent app:app
 								   # Start Mock OpenAI server
 								   python scripts/stress-test/setup/mock_openai_server.py
 								   ```
 . **High error rate**:
 								   - Reduce concurrency level
 								   - Check system resources (CPU, memory)
 								   - Review API server logs for errors
 								   - Increase timeout values if needed
 . **Permission denied running script**:
 								   ```bash
 								   chmod +x run_benchmark.sh
 								   ```
 								## Advanced Usage
 								### Running Multiple Iterations
 								```bash
 								# Run stress test 3 times with 60-second intervals
 								for i in {1..3}; do
 								    echo "Run $i of 3"
 								    ./run_locust_stress_test.sh
 								    sleep 60
 								done
 								```
 								### Custom Locust Options
 								Run Locust directly with custom options:
 								```bash
 								# With specific user count and spawn rate
 								uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
 								  --host http://localhost:5001 --users 50 --spawn-rate 5
 								# Generate CSV reports
 								uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
 								  --host http://localhost:5001 --csv reports/results
 								# Run for specific duration
 								uv run --project api python -m locust -f scripts/stress-test/sse_benchmark.py \
 								  --host http://localhost:5001 --run-time 5m --headless
 								```
 								### Comparing Results
 								```bash
 								# Compare multiple stress test runs
 								ls -la reports/stress_test_*.txt | tail -5
 								```
 								## Interpreting Performance Issues
 								### High Response Times
 								Possible causes:
 								- Database query performance
 								- External API latency
 								- Insufficient server resources
 								- Network congestion
 								### Low Throughput (RPS < 10)
 								Check for:
 								- CPU bottlenecks
 								- Memory constraints
 								- Database connection pooling
 								- API rate limiting
 								### High Error Rate
 								Investigate:
 								- Server error logs
 								- Resource exhaustion
 								- Timeout configurations
 								- Connection limits
 								## Why Locust?
 								Locust was chosen over Drill for this stress test because:
 . **Proper SSE Support**: Correctly handles streaming responses without premature closure
 . **Custom Metrics**: Can track SSE-specific metrics like TTFE and stream duration
 . **Web UI**: Real-time monitoring and control via web interface
 . **Python Integration**: Seamlessly integrates with existing Python setup code
 . **Extensibility**: Easy to customize for specific testing scenarios
 								## Contributing
 								To improve the stress test suite:
 . Edit `stress_test.yml` for configuration changes
 . Modify `run_locust_stress_test.sh` for workflow improvements
 . Update question sets for better coverage
 . Add new metrics or analysis features