How To Read Traceroutes
Traceroute is a network diagnostic tool that displays the route of packets through the internet. NANOG provides a very good PDF that you can read through as a reference to reading/diagnosing traceroutes properly.
Types of Traceroutes
There are two general types of traceroutes:
Windows natively uses ICMP for traceroutes.
Linux natively uses UDP, but there is also an option to use ICMP (-I).
This can often explain discrepancies between Linux and Windows traceroutes, as a firewall may be dropping one or the other, which will affect the results.
Additionally, traceroutes can be done via TCP with software such as tracetcp.
How Traceroutes Work
Paths on the internet are typically asymmetrical, which means that packets will take one route to get to its destination and a different route back. This is why it’s helpful when troubleshooting to have traceroutes in both directions. Traceroutes, due to small sample size, are difficult to use to determine an issue unless it’s fairly severe (more on diagnosing issues with traceroute below). In many cases, it is better to have an MTR, as it is better equipped to diagnose smaller problems with low amounts of traffic loss, etc. MTR (My Traceroute) is effectively traceroute on crack. Traceroutes are generally for latency detection, whereas MTRs are generally for loss detection.
Traceroutes work by sending probes (either UDP or ICMP). It starts with a TTL of 1 and increases the value for each successive packet sent. This process continues until the destination receives the final packet and returns a reply message.
Routers like to deprioritize packets that are sent TO them, but not ones that are sent THROUGH them. Essentially, routers are to get your data where it’s going – not to get your data to the router itself, so it makes sense that the router will not consider your requests directly to it as a priority. Because of this, busy routers will often drop packets sent directly to them due to rate limiting, etc. This can cause traceroutes and MTRs to show dropped packets in the middle of a path, but this is not necessarily a concern unless the dropped packets continue/get worse to the destination.
The following is a route that was taken in November 2011 when a Time Warner Cable router in LA exploded (this was found out later). Time Warner fervently denied that there was any problem with this route:
traceroute to test.com (216.144.236.x), 64 hops max, 52 byte packets
1 interwebs (192.168.1.1) 110.301 ms 2.996 ms 10.825 ms
2 cpe-75-84-240-1.socal.res.rr.com (220.127.116.11) 43.838 ms 40.947 ms 17.151 ms
3 ge-2-0-28.slbhca1-swt2.socal.rr.com (18.104.22.168) 21.329 ms 13.559 ms 11.565 ms
4 tge0-8-0-6.lamdca1-cr01.socal.rr.com (22.214.171.124) 36.968 ms 38.727 ms 38.766 ms
5 agg28.lsanca4-cr01.socal.rr.com (126.96.36.199) 36.020 ms 32.257 ms 29.470 ms
6 ae-6-0.cr0.lax00.tbone.rr.com (188.8.131.52) 26.973 ms 19.694 ms 28.846 ms
7 ae-0-0.cr0.lax30.tbone.rr.com (184.108.40.206) 21.704 ms 14.704 ms 21.490 ms
8 ae-7-0.cr0.dfw10.tbone.rr.com (220.127.116.11) 63.136 ms 56.494 ms 55.780 ms
9 ae-0-0.cr0.hou30.tbone.rr.com (18.104.22.168) 58.752 ms 57.392 ms 57.414 ms
10 ae-1-0.cr0.atl20.tbone.rr.com (22.214.171.124) 81.521 ms 92.457 ms 72.535 ms
11 ae-0-0.pr0.atl20.tbone.rr.com (126.96.36.199) 85.192 ms
188.8.131.52 (184.108.40.206) 84.573 ms 84.226 ms
12 paix.10ge.atl.bboi.net (220.127.116.11) 81.625 ms 85.685 ms 95.950 ms
13 nsh-ten4-1-atl-ten3-2.bboi.net (18.104.22.168) 85.269 ms 81.772 ms 103.503 ms
14 dal-ten2-1-nsh-ten1-4.bboi.net (22.214.171.124) 99.060 ms 82.125 ms 82.933 ms
15 phx-ten1-1-dal-ten2-4.bboi.net (126.96.36.199) 89.102 ms 82.730 ms 82.560 ms
16 la-ten3-3-phx-ten2-1.bboi.net (188.8.131.52) 85.309 ms 84.211 ms 91.745 ms
17 184.108.40.206 (220.127.116.11) 82.650 ms 87.234 ms 80.980 ms
18 18.104.22.168 (22.214.171.124) 83.829 ms 81.223 ms 83.158 ms
19 test.com (216.144.236.x) 82.187 ms 85.163 ms 83.422 ms
This route goes from Los Angeles Time Warner (lax30.tbone.rr.com) to Dallas Time Warner (dfw10.tbone.rr.com) to Houston Time Warner (hou30.tbone.rr.com) to Atlanta Time Warner (atl20.tbone.rr.com) before passing it off to BBOI (one of QuadraNet‘s uplinks) in Atlanta, which swiftly carries it back to Los Angeles. A route from Los Angeles -TO- Los Angeles should never first go to Texas and Atlanta before coming back to Los Angeles.
This is a route a customer complained about:
Tracing route to 96.44.189.x.static.quadranet.com [96.44.189.x] over a maximum of 30 hops:
1 248 ms 326 ms 262 ms 126.96.36.199
2 298 ms 125 ms 155 ms 188.8.131.52
3 563 ms 120 ms 80 ms 184.108.40.206
4 153 ms 148 ms 752 ms 220.127.116.11
5 * * * Request timed out.
6 622 ms 152 ms 252 ms te0-1-0-5.mpd22.jfk02.atlas.cogentco.com [18.104.22.168]
7 145 ms 77 ms 56 ms te0-3-0-3.mpd22.dca01.atlas.cogentco.com [22.214.171.124]
8 170 ms 359 ms 907 ms te0-3-0-2.mpd22.atl01.atlas.cogentco.com [126.96.36.199]
9 280 ms 338 ms 90 ms te0-2-0-1.mpd22.iah01.atlas.cogentco.com [188.8.131.52]
10 110 ms 113 ms 152 ms te0-0-0-1.ccr22.dfw01.atlas.cogentco.com [184.108.40.206]
11 143 ms 334 ms 114 ms te7-4.ccr02.dfw06.atlas.cogentco.com [220.127.116.11]
12 115 ms 175 ms 176 ms 18.104.22.168
13 73 ms 95 ms 370 ms 96.44.189.x.static.quadranet.com [96.44.189.x]
The problem with this route is in the first several hops (note the randomly high latency in the first ~6 hops and that the latency carries through to the destination). This indicates that the issue probably has nothing to do with us.
Determining Information About Routers
The location information contained in reverse DNS records for routers can sometimes be misleading. Providers can “name” a router literally anything they like (dead.beef.cafe.level3.net could, for example, exist, or dead.beef.cafe.losangeles.level3.net could exist and not be located in Los Angeles). Usually the information is accurate and there are various things in the name that you can look at in order to get an idea of what the router is, what function it serves, who owns it, etc. etc. The port of the router can often be inferred from the name. Examples:
-ge/gi - GigE interface
-fa - Fast interface
-te or -xe - Ten gig interface
-po or -ae - Port channel bundle
-tu or -ip - Tunnel
So for instance the following router: ge-3-0-0-53.gar1.Washington1.Level3.net is on a GigE port. The provider/operator of the router is usually the most obvious thing that can be inferred from the name. ae-31-51.ebr1.Chicago1.Level3.net. This router is owned/operated by Level3. This information is important so that you can see when provider hand-offs occur (for instance, if the next hop changes to *.yahoo.com, you know that Level3 has handed the packets off to Yahoo). The function of the router can also sometimes be inferred from the name. This usually takes the following form:
customer- (ALSO agr-, ar-, or gw-)
Some increases in latency are normal and some indicate a problem. The following is a list of orig/destination locations and their -APPROXIMATE- latencies. This does not mean that all similar routes will have the corresponding latencies, but this can be used as a general guide to determine whether or not there is a problem.
LA -> Las Vegas = ~6ms
LA -> Phoenix = ~10ms
LA -> Chicago = ~50ms
LA -> New York = ~70ms
LA -> London = ~150ms
LA -> Dallas = ~40ms
LA -> China = ~180-220ms
Miami -> Brazil = ~100-130ms
Miami -> Chile = ~150ms
Miami -> Argentina = ~170ms
Miami -> China = ~210-230ms
Dallas -> Miami = ~30ms
Dallas -> China = ~200-220ms
Dallas -> Phoenix = ~25ms
Dallas -> Atlanta = ~20ms
Dallas -> New York = ~45ms