- Added DBUtils PooledDB for intelligent connection pooling - Created db_pool.py with lazy-initialized connection pool (max 20 connections) - Added db_connection_context() context manager for safe connection handling - Refactored all 19 database operations to use context manager pattern - Ensures proper connection cleanup and exception handling - Prevents connection exhaustion on POST requests - Added logging configuration for debugging Changes: - py_app/app/db_pool.py: New connection pool manager - py_app/app/logging_config.py: Centralized logging - py_app/app/__init__.py: Updated to use connection pool - py_app/app/routes.py: Refactored all DB operations to use context manager - py_app/app/settings.py: Updated settings handlers - py_app/requirements.txt: Added DBUtils dependency This solves the connection timeout issues experienced with the fgscan page.
140 lines
5.0 KiB
Markdown
140 lines
5.0 KiB
Markdown
# Database Connection Pool Fix - Session Timeout Resolution
|
|
|
|
## Problem Summary
|
|
User "calitate" experienced timeouts and loss of data after 20-30 minutes of using the fgscan page. The root cause was **database connection exhaustion** due to:
|
|
|
|
1. **No Connection Pooling**: Every database operation created a new MariaDB connection without reusing or limiting them
|
|
2. **Incomplete Connection Cleanup**: Connections were not always properly closed, especially in error scenarios
|
|
3. **Accumulation Over Time**: With auto-submit requests every ~30 seconds + multiple concurrent Gunicorn workers, the connection count would exceed MariaDB's `max_connections` limit
|
|
4. **Timeout Cascade**: When connections ran out, new requests would timeout waiting for available connections
|
|
|
|
## Solution Implemented
|
|
|
|
### 1. **Connection Pool Manager** (`app/db_pool.py`)
|
|
Created a new module using `DBUtils.PooledDB` to manage database connections:
|
|
- **Max Connections**: 20 (pool size limit)
|
|
- **Min Cached**: 3 (minimum idle connections to keep)
|
|
- **Max Cached**: 10 (maximum idle connections)
|
|
- **Shared Connections**: 5 (allows connection sharing between requests)
|
|
- **Health Check**: Ping connections on-demand to detect stale/dead connections
|
|
- **Blocking**: Requests block waiting for an available connection rather than failing
|
|
|
|
### 2. **Context Manager for Safe Connection Usage** (`db_connection_context()`)
|
|
Added proper exception handling and resource cleanup:
|
|
```python
|
|
@contextmanager
|
|
def db_connection_context():
|
|
"""Ensures connections are properly closed and committed/rolled back"""
|
|
conn = get_db_connection()
|
|
try:
|
|
yield conn
|
|
except Exception as e:
|
|
conn.rollback()
|
|
raise e
|
|
finally:
|
|
if conn:
|
|
conn.close()
|
|
```
|
|
|
|
### 3. **Updated Database Operations**
|
|
Modified database access patterns in:
|
|
- `app/routes.py` - Main application routes (login, scan, fg_scan, etc.)
|
|
- `app/settings.py` - Settings and permission management
|
|
|
|
**Before**:
|
|
```python
|
|
conn = get_db_connection()
|
|
cursor = conn.cursor()
|
|
cursor.execute(...)
|
|
conn.close() # Could be skipped if exception occurs
|
|
```
|
|
|
|
**After**:
|
|
```python
|
|
with db_connection_context() as conn:
|
|
cursor = conn.cursor()
|
|
cursor.execute(...) # Connection auto-closes on exit
|
|
```
|
|
|
|
### 4. **Dependencies Updated**
|
|
Added `DBUtils` to `requirements.txt` for connection pooling support.
|
|
|
|
## Benefits
|
|
|
|
1. **Connection Reuse**: Connections are pooled and reused, reducing overhead
|
|
2. **Automatic Cleanup**: Context managers ensure connections are always properly released
|
|
3. **Exception Handling**: Connections rollback on errors, preventing deadlocks
|
|
4. **Scalability**: Pool prevents exhaustion even under heavy concurrent load
|
|
5. **Health Monitoring**: Built-in health checks detect and replace dead connections
|
|
|
|
## Testing the Fix
|
|
|
|
1. **Rebuild the Docker container**:
|
|
```bash
|
|
docker compose down
|
|
docker compose build --no-cache
|
|
docker compose up -d
|
|
```
|
|
|
|
2. **Monitor connection usage**:
|
|
```bash
|
|
docker compose exec db mariadb -u root -p -e "SHOW PROCESSLIST;" | wc -l
|
|
```
|
|
|
|
3. **Load test the fgscan page**:
|
|
- Log in as a quality user
|
|
- Open fgscan page
|
|
- Simulate auto-submit requests for 30+ minutes
|
|
- Verify page remains responsive and data saves correctly
|
|
|
|
## Related Database Settings
|
|
|
|
Verify MariaDB is configured with reasonable connection limits:
|
|
```sql
|
|
-- Check current settings
|
|
SHOW VARIABLES LIKE 'max_connections';
|
|
SHOW VARIABLES LIKE 'max_connection_errors_per_host';
|
|
SHOW VARIABLES LIKE 'connect_timeout';
|
|
```
|
|
|
|
Recommended values (in docker-compose.yml environment):
|
|
- `MYSQL_MAX_CONNECTIONS`: 100 (allows pool of 20 + other services)
|
|
- Connection timeout: 10s (MySQL default)
|
|
- Wait timeout: 28800s (8 hours, MySQL default)
|
|
|
|
## Migration Notes
|
|
|
|
- **Backward Compatibility**: `get_external_db_connection()` in settings.py still works but returns pooled connections
|
|
- **No API Changes**: Existing code patterns with context managers are transparent
|
|
- **Gradual Rollout**: Continue monitoring connection usage after deployment
|
|
|
|
## Files Modified
|
|
|
|
1. `/srv/quality_app/py_app/app/db_pool.py` - NEW: Connection pool manager
|
|
2. `/srv/quality_app/py_app/app/routes.py` - Updated to use connection pool + context managers
|
|
3. `/srv/quality_app/py_app/app/settings.py` - Updated permission checks to use context managers
|
|
4. `/srv/quality_app/py_app/app/__init__.py` - Initialize pool on app startup
|
|
5. `/srv/quality_app/py_app/requirements.txt` - Added DBUtils dependency
|
|
|
|
## Monitoring Recommendations
|
|
|
|
1. **Monitor connection pool stats** (add later if needed):
|
|
```python
|
|
pool = get_db_pool()
|
|
print(f"Pool size: {pool.connection()._pool.qsize()}") # Available connections
|
|
```
|
|
|
|
2. **Log slow queries** in MariaDB for performance optimization
|
|
|
|
3. **Set up alerts** for:
|
|
- MySQL connection limit warnings
|
|
- Long-running queries
|
|
- Pool exhaustion events
|
|
|
|
## Future Improvements
|
|
|
|
1. Implement dynamic pool size scaling based on load
|
|
2. Add connection pool metrics/monitoring endpoint
|
|
3. Implement query-level timeouts for long-running operations
|
|
4. Consider migration to SQLAlchemy ORM for better database abstraction
|