Tuesday, February 11, 2014

12 Reasons To Be Learning Graph Theory

12 Reasons To Be Learning Graph Theory

Andrés Osinski

Courtesy of Matt Britt

Throughout our schooling we always encounter some topics that we feel very strongly about, either because of how fascinating they are or how tediously boring or difficult they may be. Graph Theory is one of those controversial topics CS students will always be opinionated in; you either love it and are fascinated by its utility and applications, or you're appalled by the uselessness of the topic in your career. And let's be honest: when was the last time you actually, honestly had to work with graphs in a serious manner?
Even if you're lucky enough to never need to apply Graph Theory for what you do (which is actually the case with the majority of CS graduates), it's in the interest of everyone who wants to spend 4-6 years of their life dedicated to the study of computing and come out of the experience as a well-rounded professional. Now, you ask, what's so specially important that I should mind spending the next six months trying to wrap my head around these concepts and complete some of the most difficult coursework available? Here's just a few of the topics where this knowledge applies:

  • PageRank: the algorithm behind Google. Indexes pages and content as graphs, with edges representing cross-references. There's probably several properties each node and edge has, but that's the basics.
  • Pathfinding: Finding a path from one place to the other. Nodes represent forks in a road. Edges represent roads. Each edge is weighted with its distances or approximate traveling time. If a road is unidirectional, the edge is directed. After this, a sort of shortest-path algorithm or heuristic is applied ( Dijkstra's,Bellman-Ford, Floyd, etc). This is used everywhere, from industrial control systems, GPS units, optimization, etc.
  • Optimal traffic distribution in a network: A bunch of data on a network (be it a computer network or anything that can be represented analogously) is given a source node and a destination node. Each node in the network represents a network device. Each edge represents a connection. Each edge has a weight, which determines the maximum amount of content it can carry (capacity). From there, you can apply the a series of flow algorithms to determine the optimal distribution of data you're sending between all the possible nodes it can traverse. Used for routing algorithms.
    It should be noted that that's the naive version of the problem. In real life this gets tremendously complicated, as you have to deal with several flows, each vying for the same capacity.
  • Compiler optimization: A CPU has a limited number of registers where it can store its data, which is the quickest way to access it. Given a certain number of variables used in a program, you need to find a way to allocate variables in registers so as to maximize the use of the registers and minimize access to memory. This is solved through graph coloring, assigning each node/variable a "color", such that two adjacent nodes (variables used at the same time in a program) do not have the same color. This is an NP-complete problem, so you need to know that no optimal solution can be found in a reasonable time for large problems.
  • Finding locations of distribution centers: say you need to supply a city with some service. You need to select certain places so that you can cover the whole city with an adequate supply of goods without using more centers than necessary by a certain margin. This relates to the Maximum Weight Independent Set problem, the Maximum Clique problem, vertex covering, and Maximum Independent Set problem.
  • Chemical interactions: Atoms and their bindings in molecules are clearly modeled as graphs. For more complex models it's insufficient to represent a molecule as a graph, but it is a significant part of the modeling component.
  • Scheduling: You have a conference center with a certain number of rooms, and different courses/presentations are assigned. Allocate all the presentations so that the schedules do not overlap, maximizing the usage of the conference rooms, and minimizing the duration of each conference by packing related courses as near as possible without overlap. This is actually an extremely difficult problem.
  • Optimum distributions paths: You need to distribute the mail to every house in a town, minimizing the number of trucks you use and making sure the trucks take the necessary routes and no more. This is a classing problem with Eulerian and Hamiltonian cycles; Traveling Salesman or Chinese Postman problem.
  • Project Management: A PERT graph is a type of directed graph used for project management. It's a fairly universal tool for determining the dependencies and times for the completion of a project. The Critical path algoritm is applied to determine the shortest possible time a project can be completed in while meeting all its dependencies.
  • Program analysis: Several program analysis techniques used for determining the types of variables, compiler warning and errors, null-dereferencing bugs and variable optimization techniques can be utilized by describing a program as a series of graphs. One examples is the control flow graph. Although from computability theory you should know that the Halting Problem means that you can't determine if every program can terminate, in practice using the Invariant Theorem you can see if most sorts of problems terminate. How do you reference the life cycle of variables, control structures, and data dependencies? You guessed it!

  • A simple call graph

  • Package management and dependency management: In any serious operating system, packages have dependencies on each other, such as Firefox depending on GTK, glibc, etc. Nodes: packages: Edges: dependencies.
  • Circuit board layout: The layout of circuits can be described as a graph where components are nodes and electrical lines are edges. Now, you can't draw arbitrary lines through a circuit board, and not all layouts are optimal. Ideally, you want a planar layout where you can draw the lines without them criss-crossing. This is the Graph Planarity problem. Metarecursively speaking, you also need to apply the planarity problem to draw graphs in in a sheet of paper in a way that can easily understoof by a human reader; a graphical representation of a graph is completely arbitrary and unrelated to the graph's actual structure.
And this is just the tip of the iceberg. Any mildly interesting computer problem can be described with graphs. Combine graph theory with linear programming and numerical methods and you've got Operations Research, one of the most interesting (and lucrative) branches of applied mathematics. Knowledge in graph theory is generally a deal-breaker when getting a job in software research or any sort of intersting position that involves solving difficult problems. It's essential for working in research postions at Google, IBM, and other technology companies. Knowledge in Operations Research and Combinatorial Opimization is essential for industrial processes of any kind, and it's helped to create and destroy entire companies (shipping and logistics companies, for example, depend on it completely to lower costs and operate efficiently). A lot of hard research in social interaction is modeled with graphs, etc.
This could go on and on, but suffice to say that these examples should convince you of just how relavant, current and important Graph Theory is for our professions, whether we work directly with it creating new technologies, or silently working for us in the background, day after day.

No comments:

Post a Comment