AR ISP / GNAR

Graphical Network Augmented Reality (GNAR); Mapping Internet Infrastructure with Augmented Reality

Tommy Sharkey

UC San Diego, USA

Isaac Nealey

UC San Diego, USA

Vidya Raghvendra

UC San Diego, USA

A node graph of Internet Service Provider network infrastructure connections is rendered with our prototype AR application.

Introduction

The current state of Internet Service Provider (ISP) hardware debugging requires a great deal of time and labor investment. ISPs are not always able to determine when a problem with a network link is caused by a hardware problem due to the service silently failing. Because of this, ISPs often rely on their customers self reporting problems. This means that when service goes down, there's a delay between when the problem occurs, when it is noticed, and when a report is made to the ISP. Compounding this delay is the fact that in order to even debug a problem, a technician must drive out to the location where the problem has occurred, then search around for network infrastructure that may be damaged. The technician walks around the neighborhood in larger and larger search areas looking for visual cues of damage to physical infrastructure.

If any visual cues are found, the technician then reports this information back to the ISP. Which can either prompt the technician to repair it themselves or make new requests for specialized equipment, if necessary. This entire process is time-consuming and involves a great deal of looking around for the right cable to repair.

The commercialization of augmented reality (AR) technologies serves as a potential avenue to streamline this process. Typically, augmented reality applications rely on Simultaneous Localization and Mapping (SLAM) to determine where the device is in space (also called localization). These maps allow for highly detailed and accurate localization at room and building scales. Unfortunately, ISP infrastructure is stretched across an enormously large space, so existing methods of loading and saving SLAM maps when starting an application is infeasible due to the size of the saved map files. ISPs would effectively have to store the entire region it was serving, creating 100s of terabytes worth of data for a single city.

An alternative approach has been popularized by recent games like Pokémon Go and Wizards Unite, where GPS is used to get a gross localization on handheld augmented reality devices. These systems are accurate to within several meters, which is prohibitive for many technical augmented reality applications. But since ISP infrastructure is stretched out on a city scale, several meters is close enough to get someone close before searching for the specific piece of network infrastructure.

fiber optic splice cabinets

paint indicating underlying cables (or lack thereof)

Related Work

We are aware of several projects involving network visualization in AR such as \cite{Buschel3dNode, deviceIdentification}, in addition to projects involving GIS data in AR/VR, such as Google Earth \cite{googleEarth}. Most networking-related augmented reality applications we read about built visualizations for the global topology, and debugging routing issues after outages or when installing new hardware. We wanted to focus on the "street view" of the internet, mapping the hardware around us that makes global internet topologies possible.

System

We introduce Graphical Network, Augmented Reality, (GNAR) as a technology probe that investigates whether or not GPS based augmented reality is a feasible solution for ISP physical infrastructure debugging. We additionally consider the potential for crowd sourcing the mapping and debugging of ISP infrastructure to the consumer, where an individual with a phone or tablet can go outside, walk around and map the network infrastructure around them, placing labels on that infrastructure so that later, in the event of a breakdown, other people can walk outside and quickly find the network infrastructure around them that belongs to their ISP.

Because the system may be targeted toward consumers and not a technical audience, one of the primary principles we had in the creation of the technology probe was to make it as simple as possible. We wanted to create a system that essentially just shows the user what direction to walk in and what to look at - to follow an augmented reality line on the street and take pictures of anything potentially causing an outage issue. Contextualizing the topology of network connections with the physical world infrastructure. By mapping the Internet's physical connections, a user could see that the fiber optic connection to their ISP goes through a construction site, a car crash has happened next to a spice cabinet, or flooding has occurred near a major network switch.

Mapping

To map the network infrastructure of a neighborhood, a user opens the application and sees a video-see-through view \cite{zoomingSketchpad} of their world through their tablet or phone. They then have buttons in the corners of their screen allowing them to enter an edit mode. In this mode, they are able to place, connect, and label waypoints with provider information and other metadata.

To drop a waypoint, the user points their device's camera at the location they're trying to target. The application will find the floor and nearby walls and create planes in their locations. When the user clicks on the screen, their click location is ray cast onto the estimated planes of the environment and a 3D model of a large GPS waypoint is placed at the location where they pressed.

A series of text boxes and buttons appear, asking for information on what ISPs are located at this waypoint, and what sort of infrastructure it is they're looking at (splice cabinet, Fiber optic cable patch, Network switch. Etc.). Additionally, a list of other nearby waypoints will appear. The user can click on the that list of waypoints to select connections. As they select and deselect connections, lines appear in the street showing them the direction of the nearby connections. Users can walk in the direction of these nearby waypoints or look for other contextual cues (e.g. spray paint on the ground) to determine whether the infrastructure is physically connected. Once the user has determined which connections should be made, they can save it, uploading all of this information to a remote database, along with a photo of the site.

"Walking the Line"

When looking for problems with the network infrastructure, a user opens the app and loads nearby waypoints. When loading nearby waypoints, only their GPS locations and metadata are loaded onto their device. GPS waypoint icons will appear in the world around them. With lines joining connected waypoints, and colored lines indicating provider specific connections. For a user to debug their specific ISP, they simply need to follow the colored line that represents their ISP; walking along this line until they see something interesting that might be causing problems in the network. In the industry this is known as "walking the line".

As they approach an existing waypoint, they see a large GPS waypoint model in their view. If they click on this model, the metadata for that waypoint appears alongside the photograph that was taken when it was first uploaded. Because of inaccuracies in GPS the waypoint may not be exactly on the physical infrastructure that was being labeled. The photograph helps mitigate this offset. We postulated that a photograph provides enough contextual information for a user to figure out the infrastructure being indicated. The user uses the GPS waypoint to get in the vicinity of the infrastructure and then looks at the photo to determine exactly where the infrastructure is.

Once they have identified the infrastructure, they can compare it to the photo to see if anything has been damaged since the photo was taken (e.g. a crack or dent in a splice cabinet). If they see damage, the user can take a screenshot and save it to the database. A licensed technician (perhaps working for an ISP) could use the GPS location where the photo was taken, as well as the metadata attached to that infrastructure to determine what kind of equipment or service is needed at that location.

Additionally, while the user is walking between waypoints (presumably following buried fiber optic cable), they can look out for other possible disturbances like construction work, brush fires, or car accidents. If they come across any these, just as before, they can take a screenshot of it, upload it to the database, and subsequent users will be able to see the GPS location at which it happened and determine how to respond.

Technical Details

GNAR was created using the Unity game engine and its ARFoundation package which serves as a mapping to iOS's ARKit and Android's ARCore. While we only tested using iOS devices, we chose Unity specifically because it makes the switch between iOS and Android trivial (a few mouse clicks). These APIs (ARKit and ARCore) automatically take advantage of various device sensors and fuse them together to improve mapping and localization. They take advantage of LIDAR sensors, camera imaged Jacobians, and sensor fusion from IMU devices in phones and tablets.

As the application runs, it is constantly mapping the world and finding planes. These planes can be the floor, walls, planar approximations of cylindrical objects like telephone poles. etc. The purpose of constantly looking for planes is so that when the user presses on the screen, that touch location can be converted into a real world location. This is done by taking the pixel coordinate of the touch location on the device and converting it into the Unity engine's camera coordinate space. Once there, it can be raycast to the nearest plane. The result of that Raycast gives a X,Y,Z position in space.

Because GPS does not use a Cartesian coordinate system, that X,Y,Z Cartesian coordinate must be converted into a GPS latitude, longitude, and altitude coordinate. This is done by taking the device's current GPS location with the distance between the device and the placed waypoint, and conducting a reverse Haversine calculation. This produces a radius, or circle of potential locations where the point could lie on. That circle is reduced to a single point by taking into account the devices bearing (compass direction). This same process (inverted) is used to determine where to place waypoints that have been loaded from the server.

Because of the large scale of Internet service providers and the limited RAM of handheld devices, the data loaded by devices is loaded and discarded dynamically. Instead of loading all of the network infrastructure in the database at once, only the network infrastructure in the local region is loaded. Even further, of that information, only the GPS location and metadata (ISP provider) is loaded. This prevents the system from overwhelming small handheld devices. When users approach a particular waypoint and click it, a new request is sent to load all of the information about that waypoint; including photos that have been taken of that waypoint. All of this data is stored in a SQL database running on Amazon Web Services (AWS).

Lastly, we also implemented a system for dynamically saving and loading SLAM maps; where as the user would get within a distance of a waypoint, the users device would begin downloading a SLAM map of that location. Once loaded, the users view will snap into alignment with the SLAM map and a very precise (within a couple centimeters) location of the waypoint will be achieved. Each time a user enters or leaves the region around a waypoint, these maps are either loaded or discarded. Part of the process of creating a new waypoint involves creating this map. This is done without explicit input from the user by beginning to create the SLAM map when they place the waypoint and continuing to build that map as they enter information about the ISP, the type of infrastructure at the waypoint, and as they connected it to other waypoints. This gives the system several seconds to a minute to build a slam map before the user presses the save button, uploading the SLAM map alongside the photo and metadata to the SQL database.

Initially it was expected that we would need this feature in order to get reproducible results finding this infrastructure. But after experimenting with the use of pictures instead of these SLAM maps, we found that the pictures were good enough to find the correct location, and had the additional benefit of requiring vastly less data storage than creating an entire map around each waypoint. As a result, we stopped working the SLAM map feature. However, we did test early stages of this feature, and consider it a feasible means of improving the waypoint accuracy should it be desired in a future version.

Early Prototype

Evaluation

To evaluate the effectiveness of this system, we conducted both the mapping and discovery exercises. We mapped several blocks of physical internet infrastructure in the La Jolla/UTC San Diego area, making notes of technical issues as we went. We then closed and reloaded the application and attempted to retrace our steps simply by following the connections that entered into the application. When we would get to a previously placed waypoint, we would select it and look at the associated image saved in the database to see if we could quickly identify the marked device without needed to move around. In general, this process worked very well. On our second pass, during the debugging exercise, we were able to find all of the physical infrastructure indicated during the mapping exercise, with reasonable improvisation. The rest of this section will discuss the improvisations required as well as other technical performance notes.

In terms of following the augmented reality node graph shown on the device, we ran across two difficulties. The first is simply due to drift in the compass data on portable devices like smartphones and tablets. Looking down a street through the augmented reality view, a line that should be parallel to the sidewalk might appear to eventually cross the street. This did not become a preventative issue, however. If the user continued to walk down the sidewalk following the line indicated in AR, the overlay will generally snap back into place as the mobile device updates its compass information. This is similar to a common occurrence when following GPS guided directions in a car. Often the mapping application will indicate that a turn was taken when it was not, and it takes a second or two for the algorithm to update and properly indicate the road being traveled. In a worst-case scenario, the user can refresh the compass heading on their mobile device, at which time the overlay will reorient itself and it becomes clear that the user is to continue following the sidewalk and is not meant to cross the street.

The second problem we came across was when we were confronted with a nonlinear street. Say a waypoint was placed nearby, and the next waypoint is farther down this curved road. As a result, the line connecting the two waypoints will be overlaid in the application as the crow flies, as opposed to following the road. This could lead a user through foliage if they just traced the overlaid line, when one could reasonably assume the intended route is to follow the curve in the road. This was the largest point of improvisation that was required. By choosing to follow the road, the line connecting the adjacent nodes was lost from sight for several meters. Although we already knew where the next waypoint should be, an naive user may get confused. In future iterations, interpolating the overlay to known roadways could alleviate the issue.

In terms of GPS accuracy, we were surprised by how close the waypoints were when we reloaded the application. In most cases, the augmented reality waypoints would appear one or two meters from the infrastructure point. Clicking on the waypoint and looking at the picture one could quickly deduce which nearby object is the marked internet infrastructure.

One to two meters is close enough to the width of a sidewalk where if the waypoints appeared in the street, they were still close enough where the user need to step off the sidewalk. However, some of the infrastructure we labeled was in the middle of the street. This could present a safety risk for a user placing or following waypoints \cite{pokeDeath}, especially if they are working alone. As we were testing in a group, others could watch for vehicles while someone interacted with the application. If we were to crowd source waypoints moving forward, the application would need to have built in safety precautions, or be used by trained professionals only.

Discussion

With these previously discussed problems in mind, this section goes into a discussion on how to mitigate some of these problems, and speculates onto other, larger problems that may occur but that we did not experience.

Implications for Design

One of the first problems we ran into was when we faced a spray painted marking in the street that was labeled "MCI" \ref{fig:mci}. We had never heard of this company before and assumed that it another name for an existing company that we already knew about, either because its infrastructure is labeled with a different name or because of some historic merger that happened. A quick search online told us that this infrastructure was owned by Verizon, a more recognizable company name, so we were able to label it accordingly. Additionally, on our walk, we noticed several pieces of infrastructure that were labeled with names that no longer existed, had been absorbed by other companies, or no longer made sense (e.g. TWC and Spectrum have merged into one company but have separately labeled infrastructure).

These are not problems with the application, but are rather problems with the user experience. If some individuals mapping a neighborhood are mapping infrastructure with the literal labels like "TWC", while others are mapping with "Spectrum", Our system needs to be able to accommodate these naming differences. There are several ways this could be handled. One way is to auto-complete or auto correct entered data. So, when a user begins to type in “TWC” it is auto corrected to Spectrum, and the user is given some indication of why it was auto-corrected to Spectrum so that they don't try to fix it and change it back. Alternatively, what we did was provide a dictionary in which any infrastructure labeled with "Comcast," "comcast," "TWC," "twc," "Time Warner Cable," "time warner cable," "Spectrum," or "spectrum" would be treated as the same company.

Additionally, when reading spray painted marks in the street, it can often be extremely difficult to understand what those marks say, and particularly when abbreviations are used. For instance, many AT\&T markings are stylized and difficult to read unless you already know that they say AT\&T. Other examples include the use of abbreviations or shorthand ("F/O" for Fiber Optic cabling), see Fig. \ref{fig:mci}.

Additionally, because we had no ISP partner to work with, we had no pipeline for submitting these images to an ISP. Complicating this further, different ISPs have various methods of submitting these reports, and they may or may not have real-time feedback in response to those reports. Some searching can be done to find these bug report submission forms online and to create scripts which can auto-generate these reports and submit them to the ISP's. To some degree ISP buy-in to the system will be required in order to get more real action occurring as a consequence of these submitted reports. This might take the form of a phone call or video call where the person on the phone can prompt the user walking around the neighborhood to look for things and give them a special pipeline for submitting their form. So, the moment a picture is taken and uploaded to a location, the person on the phone could see it and initiate a service request based on that image, all within seconds of the person taking the photo.

This also opens up possibilities for remote technical support to prompt users to take pictures of certain things or for users to be able to ask questions of the technicians as they see things that may or may not be of interest. This would take on a closer form to a remote debugging session \cite{tommysPaper}, where the technician can prompt the consumer to look for certain things or ignore certain things, or can clarify information.

most streets markings are not clear to a user outside the industry

Labeled Data

One positive consequence of photographing, mapping, and labeling Internet infrastructure is that users who are generating this information are also creating a labeled data set. Labeled datasets are extremely convenient for training neural networks because both the input and output information is known. Some of the problems faced by users when labeling these datasets involve things like reading scripted spray paint or identifying which subsidiary company belongs to which parent company, which could easily be solved by a neural net. One consideration we have for a real-world deployment of this system is that as more and more waypoints are added in more and more pictures, maps and labels are set. A neural net could very easily be trained to automatically identify some of this infrastructure and label it, making the consumers experience with mapping areas much more streamlined and easy.

Potential Bad Actors

On the other hand, there is an enormous security risk that arises by crowdsourcing this information. Many public utilities like power plants or gas refineries are protected with heavy security measures, the reason being that because they serve large populations, they can be targeted by bad actors who seek to cause damage or mayhem. Internet infrastructure, however, lies out in the open for anyone to see. At any point in time, someone can quite easily destroy or hamper that infrastructure, shutting down access to Internet services. While this may be trivial in the case of a home or a neighborhood, it can be vastly more important in situations where hospitals are served, when public transit relies on the Internet for synchronization of train schedules, or when neighborhoods are relying on the Internet for information during difficult times such as war or natural disaster.

The physical infrastructure of the Internet, however, doesn't often come under attack, and we posit that this is partially because it is "hiding in plain sight". The codification of infrastructure and the nondescript labeling makes it fade into the background. People ignore that infrastructure, seeing it just as background noise that they don't need to pay attention to. By creating an application that clearly shows where this infrastructure is and shows what regions that infrastructure serves, we would inadvertently be creating a very useful tool for someone wishing to commit cyber crimes or cyberterrorism.

For an actual system like this to be deployed, this problem needs to be taken seriously. It could be handled by only allowing customers temporary access to these databases. For instance, only when they notice a service going down are they able to access the database of information about where this infrastructure lies and what it connects to. Once their problem has been resolved, they lose access to this system. In a similar vein, they may only be allowed access to the infrastructure that servers their residence - making them less likely to want to damage it.

Alternatively, the system could be gamified in such a way that people aren't aware that they are mapping Internet infrastructure, but inadvertently doing so. This could leverage existing games like Pokémon Go, for example. Objectives that the user is trying to reach could be located on infrastructure points, so that while the user is trying to capture a Pokémon, the phone is taking pictures of the network infrastructure and uploading them to the database so they can be compared to see if any damage is done. Additionally, this would mean that as users move between gaming locations where they play the game they would also be walking the same path of an ISP's network infrastructure.

Education

Before interacting with the physical internet infrastructure for this project, none of the authors had prior knowledge of what a splice cabinet was used for, and would regularly step over markings in the street without giving them a glance. In a similar fashion, an application like ours could be used for education. A networking course could use it to teach about the physical construction of the internet, or an ISP could use it to train new employees.

Teaching with AR is a whole topic in an of itself, and we will not go into much detail here. A version of our application intended for teaching would likely need to dynamically switch between global and local views of the waypoint node graph, with granular filtering abilities for different ISPs, types of waypoints, etc. A networking student or ISP trainee could learn about the physical network and use it to compliment their other coursework and training involving network analysis and discovery. Associating IP addresses with GPS waypoints could provide addition insights into \emph{traceroute} results, for example.

Conclusion

We have presented a technology probe looking at how augmented reality in handheld devices can be used to map the physical network infrastructure of a region, and how that mapping can then be used to debug problems in an Internet service provider's network. While many problems arose in the process, they all appear to be relatively easily solved by simple design decisions. While localized slam maps could be used to improve the accuracy of this sort of augmented reality system. In our evaluation, we found that simply using GPS as well as a photo of the target infrastructure was more than enough to uniquely identify the infrastructure.

More work needs to be done to handle some of the edge conditions that we found, including curving roads, handling infrastructure that's in the middle of streets so that it can be inspected safely by users, and managing the various names and companies that are connected to each other and subsidiaries of each other. Lastly, ISP buy-in would open opportunities for real-time collaboration and instantaneous feedback and service requests, but ultimately doesn't seem critical to the functioning of this application.

Lastly, we consider that some of the largest risks of an application like this are inadvertent harm to the consumers who use this system and are unable to pay attention to their surroundings while using it (paying attention to traffic) and the potential for misuse or attack of network infrastructure to deny service to neighborhoods, regions, and particular buildings or institutions.

Acknowledgements

A special thanks to Aaron Schulman for taking the time to take us on a mini field-trip, explain the problem to us in detail, and for just generally being an awesome professor.