In an effort to try to pin down and document the cause of the lag we're suffering, I let an auto-traceroute program (called Ping Plotter) run a ping every 15 seconds from 9:37pm CDT yesterday until 1:13p today, and the 2 routers (internap-gw.ip4.tinet.net,77.67.77.54 and border1.te8-1-bbnet2.bsn003.pnap.net,63.251. 128.107) just outside the Turbine domain gateway (turbine-7.border1.bsn003.pnap.net,64.9 5.76.202) are showing an ongoing and consistent 30% packet loss. This loss frequently happens in chunks that returns 0 packets at all (noted in Ping Plotter as ERR or *).
Here is an example from my data of some of these ping returns:
(Note: This first block of text is the listing of hosts, in order, that traceroutes are showing. The 2 routers outside Turbine showing the consistent and ongoing 30% packet loss are the ones 4 and 5 hops from the right in the data below the host listing.)
Host Information
1, -------------- , (hidden for privacy reasons)
2, -------------- , (hidden for privacy reasons)
3,te-6-4-ur07.elmhurst.il.chicago.comca st.net,68.86.116.1
4,te-8-4-ur08.elmhurst.il.chicago.comca st.net,68.87.210.30
5,te-2-3-0-0-ar01.area4.il.chicago.comcast. net,68.87.231.61
6,pos-3-9-0-0-cr01.350ecermak.il.ibone.comca st.net,68.86.90.45
7,pos-1-3-0-0-pe01.350ecermak.il.ibone.comca st.net,68.86.86.158
8,xe-9-0-0.chi10.ip4.tinet.net,77.67.71 .145
9,xe-2-0-0.bos11.ip4.tinet.net,89.149.1 82.170
10,internap-gw.ip4.tinet.net,77.67.77.54
11,border1.te8-1-bbnet2.bsn003.pnap.net,63.251. 128.107
12,turbine-7.border1.bsn003.pnap.net,64.9 5.76.202
13, -------------- ,74.201.102.154
14, -------------- ,74.201.102.42
Some data samples showing ongoing loss (* indicates the router returned ZERO packets!):
"4/9/2012 10:42:23 PM",0,38,10,10,12,16,14,11,57, *,39,39,38,38
"4/9/2012 10:42:38 PM",0,30,13,10,16,15,15,11,39, 41,50,*,40,38
"4/9/2012 10:42:53 PM",0,37,11,9,14,15,12,12,37,4 0,40,40,40,40
"4/9/2012 10:43:08 PM",1,30,9,10,14,26,13,12,42,* ,40,38,40,39
"4/9/2012 10:43:23 PM",0,31,9,10,14,14,13,11,41,* ,41,*,41,41
"4/9/2012 10:43:38 PM",0,36,8,9,11,15,12,12,44,*, 40,40,*,39
"4/9/2012 10:43:53 PM",0,28,9,10,13,13,13,39,41,* ,40,*,42,38
"4/9/2012 10:44:08 PM",1,33,11,13,16,11,13,28,41, *,41,41,*,40
What I think this shows is that despite enjoying a nice speedy 40ms ping, I'm experiencing intermittent freezing-type lag because packets are simply disappearing in chunks when this router decides it doesn't want to respond.
4/9/2012 9:43:36 PM",1,30,9,10,14,12,24,10,46,* ,38,39,40,40
"4/9/2012 9:43:51 PM",0,38,9,23,12,15,20,11,42,5 8,*,39,44,39
"4/9/2012 9:44:06 PM",0,23,9,10,15,18,14,12,37,3 9,39,38,41,39
"4/9/2012 9:44:21 PM",0,37,14,9,12,15,18,28,43,4 1,*,39,39,39
"4/9/2012 9:44:36 PM",1,39,9,9,25,11,12,22,37,39 ,40,38,41,43
"4/9/2012 9:44:51 PM",0,25,11,11,14,14,15,10,40, 40,39,39,40,51
"4/9/2012 9:45:06 PM",0,33,9,26,12,11,12,12,40,4 0,40,38,39,40
"4/9/2012 9:45:21 PM",0,32,11,12,12,12,12,76,55, *,41,43,*,46
"4/9/2012 9:45:36 PM",0,35,9,10,13,15,30,40,45,* ,41,43,*,40
"4/9/2012 9:45:51 PM",1,21,9,10,13,16,12,20,39,4 0,38,39,41,41
4/9/2012 10:26:08 PM",0,32,*,10,13,15,18,13,46,* ,40,*,41,42
"4/9/2012 10:26:23 PM",1,29,10,8,11,15,14,11,48,* ,39,40,38,39
"4/9/2012 10:26:38 PM",0,25,9,9,20,11,12,12,44,43 ,*,38,39,40
"4/9/2012 10:26:53 PM",0,34,10,8,12,14,12,12,44,* ,39,41,*,39
"4/9/2012 10:27:08 PM",1,28,12,9,14,14,13,12,49,* ,41,*,39,40
"4/9/2012 10:27:23 PM",1,39,8,9,15,13,12,12,49,*, 69,*,39,39
"4/9/2012 10:27:38 PM",0,*,15,9,12,13,12,10,37,39 ,40,39,39,40
"4/9/2012 10:27:53 PM",0,28,14,10,12,15,23,13,43, *,38,39,42,38
"4/9/2012 10:28:08 PM",0,39,8,11,13,25,12,13,42,* ,39,40,41,42
"4/9/2012 10:28:23 PM",0,31,10,12,12,17,11,11,47, *,39,39,38,40
"4/9/2012 10:28:38 PM",0,32,9,10,12,12,12,12,42,* ,40,38,39,40
"4/9/2012 10:28:53 PM",0,38,11,25,13,13,11,11,43, *,39,40,38,38
"4/9/2012 10:29:08 PM",0,20,9,9,11,13,12,11,49,*, 39,41,41,39
"4/10/2012 12:16:11 AM",0,36,10,8,14,15,13,11,38,4 0,*,39,38,38
"4/10/2012 12:16:26 AM",1,29,10,8,12,14,11,12,39,4 1,*,42,41,39
"4/10/2012 12:16:41 AM",0,29,9,9,12,16,12,10,75,*, 37,39,39,38
"4/10/2012 12:16:56 AM",0,23,9,11,11,18,12,10,37,4 0,38,38,40,38
"4/10/2012 12:17:11 AM",0,32,10,9,15,11,12,12,41,* ,40,38,51,40
"4/10/2012 12:17:26 AM",0,27,20,9,11,13,12,12,37,1 10,39,*,39,39
"4/10/2012 12:17:41 AM",0,30,9,10,12,14,13,10,37,3 9,37,38,39,38
"4/10/2012 12:17:56 AM",0,27,10,9,14,14,13,12,37,4 0,*,41,*,40
"4/10/2012 12:18:11 AM",0,34,8,9,15,12,12,12,39,39 ,40,40,41,39
"4/10/2012 12:18:26 AM",1,41,*,9,13,13,12,12,38,39 ,40,38,39,40
"4/10/2012 12:18:41 AM",0,*,25,9,16,13,12,12,37,43 ,*,38,42,39
"4/10/2012 12:18:56 AM",0,23,9,10,15,11,12,42,*,44 ,*,38,40,40
"4/10/2012 12:19:11 AM",0,36,10,8,12,15,11,12,39,3 8,39,39,41,39
"4/10/2012 12:19:26 AM",0,38,10,9,15,14,12,12,37,4 3,*,37,38,39
4/10/2012 2:57:00 AM",1,33,9,8,13,14,13,11,38,41 ,*,40,38,38
"4/10/2012 2:57:15 AM",1,22,11,9,13,14,12,12,37,3 9,39,38,39,40
"4/10/2012 2:57:30 AM",0,39,8,9,14,19,12,17,40,39 ,39,38,38,39
"4/10/2012 2:57:45 AM",1,36,10,11,12,14,11,12,39, 38,41,*,38,39
"4/10/2012 2:58:00 AM",1,39,8,9,14,13,11,12,37,23 1,39,38,38,39
"4/10/2012 2:58:15 AM",1,22,10,10,11,14,11,11,38, 40,*,39,38,38
"4/10/2012 2:58:30 AM",1,23,9,10,16,16,13,11,38,4 0,*,39,40,40
"4/10/2012 2:58:45 AM",1,31,10,10,13,12,14,13,77, 42,*,39,41,40
"4/10/2012 2:59:00 AM",0,32,10,9,13,13,12,12,49,* ,39,37,38,39
And so on for the whole ~16 hours I tracked (and it's still running.) Some periods show much less loss than indicated above. Occasionally other routers are briefly implicated.
Now I'll be the first one to admit that I don't really understand much about this stuff, but I do know that 30% lost packets overall means lots of lag because packets have to keep being resent over and over until they get through.
Sapience or whomever: If my full data set would be of use to Turbine, I'll be happy to send it, and/or post instructions so others can also run these traces to submit them to you if they're useful at all. Ping Plotter is shareware with a 30-day free trial.
Galaxiana