Respond to Connection state might be too large log messages

Logs that contain “connection state might be too large” messages indicate problems with synchronizing state information between nodes in a Firewall Cluster.

Problem description: You see “error when serializing for state sync” messages with a “connection state might be too large” clarification log entries for a Firewall Cluster. You might also experience intermittent or continuous problems with clustering and traffic flow, which are typically alleviated for some time by rebooting all clustered nodes.

Reason: The Firewall keeps a record of all connections that are handled statefully to be able to track the connection. When the Firewall is clustered, this connection table must be synchronized between the nodes to allow connections to continue if a node goes down. When the state table grows excessively large, the Firewall engines can no longer effectively use it.

A misconfiguration usually causes this message. Typical configuration problems include:

  • Using the Oracle Protocol Agent on the actual database connections between the client and the server. The Oracle Protocol Agent is meant for cases where TCP port 1521 is used only for negotiating the port number for Oracle database connections. The port number for the actual connection is assigned dynamically. The Oracle Protocol Agent must not be used in any other cases.
  • Excessive idle timeouts defined in Access Rules. All TCP connections are normally explicitly closed by the communicating parties and can therefore be cleared from the state table based on actual connection state. Non-TCP protocols do not establish connections. The communications are still handled as virtual connections on the Firewall to allow all Firewall features to be used on the traffic. Because the communicating parties do not have a closing mechanism, these virtual connections are never cleared from the Firewalls’ connection records before the communications are left idle (unused) for the duration of the defined timeout. If Access rules define excessively long timeouts for such traffic between many different hosts, the connection state table can grow very large.

Steps

  1. If you use the Oracle Protocol Agent, make sure that it is not applied incorrectly.
    If necessary, replace the default service that has a Protocol Agent attached with a custom service that matches the correct port without a Protocol Agent.
  2. Check the Access Rules to see if there are rules that override the default idle timeout value for non-TCP traffic.
    • Make sure that the override is not applied to any traffic that does not absolutely need a longer timeout (make the rule as specific as possible)
    • Try reducing the timeout (generally, the idle timeout should not be more than a few minutes).
    • In some cases, allowing both communications directions separately might remove the need for long timeouts