On October 4, 2021, Facebook went offline for six hours, leaving users trying to figure out what was going on. People had trouble using the well-known social media site during that time, and organizations that depended on Facebook’s services experienced serious interruptions. What led to this enormous disruption? a BGP (Border Gateway Protocol) problem. BGP, a crucial mechanism for data routing across the internet, is crucial for maintaining connectivity in the digital age. We will discuss what BGP is, why it is significant, and how it affects our daily life in this blog post.
But exactly what is BGP?
Recent publications have utilized several extremely appropriate metaphors to illustrate the BGP. It has been compared to a variety of things, including air traffic controllers and the expanding internet map. “The internet’s duct tape” is the name of it. And they’re all right.
Data requests are routed by the BGP protocol, which tells them the best route to travel to get to the server. BGP For instance, BGP sends your data packet along the shortest path from Facebook’s servers to you, whether you access Facebook directly or use the app to boost your feed. For instance, whether you access Facebook or open the app to increase your feed,
Cloudflare says BGP is the “mail service of the Internet,” and it determines the most efficient and fastest path for your requests to travel in order to reach the target server. BGP assesses all available routes to your data and chooses the one it thinks is the best.
Often, that means routing your data through the automated systems that make up the Internet as a whole. BGP figures out which systems are talking to each other and sends your data along the fastest path between them so it can reach its intended destination.
Continuing the post office metaphor, every autonomous system on the Internet is like a branch of the post office. Even though your town may have thousands of mailboxes, every piece of mail still has to go through the post office before it can be delivered.
Internet examples of autonomous systems include to Internet Service Providers (ISPs) such as Comcast, AT&T, and Verizon. to the postal services in different countries.
There are two types of PGB:
- External BGP (eBGP): The protocol used by the internet at large. In our post office metaphor, this is akin to international shipping.
- Internal BGP (iBGP): An internal BGP protocol that autonomous systems can choose to use to route data within their own networks. This is similar to the mail services in different individual countries.
It is not necessary to set up IBGP or automated IBGP in order to access the wider Internet using EBGP.Part of the way BGP works is to advertise viable routes for data. If BGP stops working, the data has nowhere to go because those routes cannot be found and disappear from the Internet.
That’s part of what happened at Facebook. Santhosh Janardhan, Facebook’s VP of Infrastructure, santosh janaradhan said in his blog post:
ONE OF THE JOBS PERFORMED BY OUR SMALLER FACILITIES IS TO RESPOND TO DNS QUERIES. DNS IS THE ADDRESS BOOK OF THE INTERNET, ENABLING THE SIMPLE WEB NAMES WE TYPE INTO BROWSERS TO BE TRANSLATED INTO SPECIFIC SERVER IP ADDRESSES. THOSE TRANSLATION QUERIES ARE ANSWERED BY OUR AUTHORITATIVE NAME SERVERS THAT OCCUPY WELL KNOWN IP ADDRESSES THEMSELVES, WHICH IN TURN ARE ADVERTISED TO THE REST OF THE INTERNET VIA ANOTHER PROTOCOL CALLED THE BORDER GATEWAY PROTOCOL (BGP).
Santosh Janardhan
In other words, the Internet’s Domain Name System (DNS) protocol acts like a list of addresses, and BGP is the mail service that gets mail to those homes. Mail cannot be delivered if you have an address but no direction to the house.
…DNS SERVERS DISABLE THOSE BGP ADVERTISEMENTS IF THEY THEMSELVES CAN NOT SPEAK TO OUR DATA CENTERS, SINCE THIS IS AN INDICATION OF AN UNHEALTHY NETWORK CONNECTION. IN THE RECENT OUTAGE THE ENTIRE BACKBONE WAS REMOVED FROM OPERATION, MAKING THESE LOCATIONS DECLARE THEMSELVES UNHEALTHY AND WITHDRAW THOSE BGP ADVERTISEMENTS. THE END RESULT WAS THAT OUR DNS SERVERS BECAME UNREACHABLE EVEN THOUGH THEY WERE STILL OPERATIONAL. THIS MADE IT IMPOSSIBLE FOR THE REST OF THE INTERNET TO FIND OUR SERVERS.
Santosh Janardhan
How does BGP mess up the internet?
Multiple factors affect the path your data takes through the internet. Another reason is cost, as some providers charge for system access. The changing nature of the Internet itself is another. Automated systems and websites can be partially or completely removed from the Internet map. They can also change or add service providers—an example might be a college switching ISPs. ranging from Comcast to AT&T VG E. BGP needs to be updated regularly to make sure the Coyote-style data is current and your request doesn’t expire.
Autonomous systems always perform BGP updates without incident. But when they go wrong, they can go very wrong. Clark explains in his article that BGP is designed to spread quickly from system to system, and a bug can have a ripple effect as we saw on Facebook.
How the Fixed.
In 2004, Turkish ISP TTNet temporarily advertised TTNet as the best destination for all internet traffic thanks to a poor BGP update, reports Cloudflare. That caused connection issues, which persisted for a full day before the problem was fixed.
The autonomous systems that make up the internet at large will implicitly trust what BGP tells them is the best path for data, which is one of the weaknesses that incidents like this point out in BGP. Despite the rarity of errors, some have stated that BGP has to be strengthened in terms of security.
However, a change of that magnitude would necessitate a simultaneous upgrade of every autonomous system connected to the internet. That implies that it would be difficult, to put it mildly, to implement significant changes to the protocol.
BGP is one of several parts that support the internet’s operation. You can navigate and recognize future outages and other issues by being informed of the structure.