Naively, it seems like one or both of two general problems must be responsible for the population-sensitive lag:

1) There is a network bottleneck (either at the edge of SSG's network, or inside it) unable to handle the traffic from ~1k players, and/or

2) The servers are unable to respond to incoming traffic at the rate it arrives from ~1k players

This is almost tautological.

While these problems may not be easy to solve, they should be completely trivial to monitor and track on SSG's side.

To check whether #1 is happening, you compare the traffic into, and out of, each network node, in both directions.

To check whether #2 is happening, you look at the server latency (response time to incoming client requests).

These types of server-side metrics (and a lot more) should be available at their fingertips, and yet they're asking for emailed dxdiag printouts and anecdotal video clips of people lagging out?