Files
quality_app/FIX_DATABASE_CONNECTION_POOL.md
Quality App System 64b67b2979 Implement database connection pooling with context manager pattern
- Added DBUtils PooledDB for intelligent connection pooling
- Created db_pool.py with lazy-initialized connection pool (max 20 connections)
- Added db_connection_context() context manager for safe connection handling
- Refactored all 19 database operations to use context manager pattern
- Ensures proper connection cleanup and exception handling
- Prevents connection exhaustion on POST requests
- Added logging configuration for debugging

Changes:
- py_app/app/db_pool.py: New connection pool manager
- py_app/app/logging_config.py: Centralized logging
- py_app/app/__init__.py: Updated to use connection pool
- py_app/app/routes.py: Refactored all DB operations to use context manager
- py_app/app/settings.py: Updated settings handlers
- py_app/requirements.txt: Added DBUtils dependency

This solves the connection timeout issues experienced with the fgscan page.
2026-01-22 22:07:06 +02:00

5.0 KiB

Database Connection Pool Fix - Session Timeout Resolution

Problem Summary

User "calitate" experienced timeouts and loss of data after 20-30 minutes of using the fgscan page. The root cause was database connection exhaustion due to:

  1. No Connection Pooling: Every database operation created a new MariaDB connection without reusing or limiting them
  2. Incomplete Connection Cleanup: Connections were not always properly closed, especially in error scenarios
  3. Accumulation Over Time: With auto-submit requests every ~30 seconds + multiple concurrent Gunicorn workers, the connection count would exceed MariaDB's max_connections limit
  4. Timeout Cascade: When connections ran out, new requests would timeout waiting for available connections

Solution Implemented

1. Connection Pool Manager (app/db_pool.py)

Created a new module using DBUtils.PooledDB to manage database connections:

  • Max Connections: 20 (pool size limit)
  • Min Cached: 3 (minimum idle connections to keep)
  • Max Cached: 10 (maximum idle connections)
  • Shared Connections: 5 (allows connection sharing between requests)
  • Health Check: Ping connections on-demand to detect stale/dead connections
  • Blocking: Requests block waiting for an available connection rather than failing

2. Context Manager for Safe Connection Usage (db_connection_context())

Added proper exception handling and resource cleanup:

@contextmanager
def db_connection_context():
    """Ensures connections are properly closed and committed/rolled back"""
    conn = get_db_connection()
    try:
        yield conn
    except Exception as e:
        conn.rollback()
        raise e
    finally:
        if conn:
            conn.close()

3. Updated Database Operations

Modified database access patterns in:

  • app/routes.py - Main application routes (login, scan, fg_scan, etc.)
  • app/settings.py - Settings and permission management

Before:

conn = get_db_connection()
cursor = conn.cursor()
cursor.execute(...)
conn.close()  # Could be skipped if exception occurs

After:

with db_connection_context() as conn:
    cursor = conn.cursor()
    cursor.execute(...)  # Connection auto-closes on exit

4. Dependencies Updated

Added DBUtils to requirements.txt for connection pooling support.

Benefits

  1. Connection Reuse: Connections are pooled and reused, reducing overhead
  2. Automatic Cleanup: Context managers ensure connections are always properly released
  3. Exception Handling: Connections rollback on errors, preventing deadlocks
  4. Scalability: Pool prevents exhaustion even under heavy concurrent load
  5. Health Monitoring: Built-in health checks detect and replace dead connections

Testing the Fix

  1. Rebuild the Docker container:

    docker compose down
    docker compose build --no-cache
    docker compose up -d
    
  2. Monitor connection usage:

    docker compose exec db mariadb -u root -p -e "SHOW PROCESSLIST;" | wc -l
    
  3. Load test the fgscan page:

    • Log in as a quality user
    • Open fgscan page
    • Simulate auto-submit requests for 30+ minutes
    • Verify page remains responsive and data saves correctly

Verify MariaDB is configured with reasonable connection limits:

-- Check current settings
SHOW VARIABLES LIKE 'max_connections';
SHOW VARIABLES LIKE 'max_connection_errors_per_host';
SHOW VARIABLES LIKE 'connect_timeout';

Recommended values (in docker-compose.yml environment):

  • MYSQL_MAX_CONNECTIONS: 100 (allows pool of 20 + other services)
  • Connection timeout: 10s (MySQL default)
  • Wait timeout: 28800s (8 hours, MySQL default)

Migration Notes

  • Backward Compatibility: get_external_db_connection() in settings.py still works but returns pooled connections
  • No API Changes: Existing code patterns with context managers are transparent
  • Gradual Rollout: Continue monitoring connection usage after deployment

Files Modified

  1. /srv/quality_app/py_app/app/db_pool.py - NEW: Connection pool manager
  2. /srv/quality_app/py_app/app/routes.py - Updated to use connection pool + context managers
  3. /srv/quality_app/py_app/app/settings.py - Updated permission checks to use context managers
  4. /srv/quality_app/py_app/app/__init__.py - Initialize pool on app startup
  5. /srv/quality_app/py_app/requirements.txt - Added DBUtils dependency

Monitoring Recommendations

  1. Monitor connection pool stats (add later if needed):

    pool = get_db_pool()
    print(f"Pool size: {pool.connection()._pool.qsize()}")  # Available connections
    
  2. Log slow queries in MariaDB for performance optimization

  3. Set up alerts for:

    • MySQL connection limit warnings
    • Long-running queries
    • Pool exhaustion events

Future Improvements

  1. Implement dynamic pool size scaling based on load
  2. Add connection pool metrics/monitoring endpoint
  3. Implement query-level timeouts for long-running operations
  4. Consider migration to SQLAlchemy ORM for better database abstraction