All Roads Lead to Jack

Sunday, 2008-04-20; 17:27:01

Twitter mapping software, source, known issues

One evening, while browsing Twitter, I noticed something funny: choose a twitterer; then, click the first Twitter profile image in that twitterer's "Following" block. If you continue to do this for each twitterer you encounter, it's extremely likely that you'll end up at Jack Dorsey's Twitter page. This worked for virtually every twitterer I follow, and it even seemed to work for most people on the public timeline.

While there are some obvious exceptions (twitterers who don't follow anybody and are just status updaters, like twitter_status), as well as circular references (which Jonathan Rentzsch pointed out), it's amazing how well this rule holds.

Well, maybe not if you consider who Jack Dorsey is. He's one of the employees of Twitter, Inc. and is likely the first one to ever have a Twitter account, given this interesting observation.

From that observation, though, sprang a fun problem to be tackled: how do twitterers relate to one another? Who a twitterer follows is public knowledge, so wouldn't it be cool to see if you could figure out the shortest distance between any two given twitterers? It's the "six degrees of separation" question all over again, applied to a social network which allows the public to have the necessary information to answer the question. It's just begging to be to have someone come along and answer it.

There was just the right mix of interest, motivation, procrastination, and knowledge that I decided to tackle the problem.

This is exactly why I love Twitter: by getting into the habit of writing down what you're thinking, others can give you pointers, useful links, advice, etc. Google is expansive and broad, but Twitter can provide specific knowledge that Google would be hard-pressed to give you.

What's great about this project is that the bar wasn't too high, so I could get meaningful results from the project relatively quickly. (In my "main" "profession" as a student, this has recently become a problem for me, so instant gratification was a pretty important ingredient.) Twitter allows you to get a list of who a twitterer followers incredibly easily and in a well-defined format. Cocoa provides frameworks that allows me to extract info from this XML format quickly and without too much trouble. I remembered enough from my undergraduate CS classes to know how to tackle this kind of problem, and Twitter was able to provide me with some ready-made classes, through Jonathan Wight, a friend on Twitter.

Chain Search Theory and Practice

And so I started coding up the app. The first issue to tackle was how to guarantee that the path between two twitterers the app finds is, in fact, the shortest path between those two twitterers. This is not only important just in terms of curiosity, but given how the search will expand exponentially, the app needs to prioritize somehow so that the search won't take days.

I remembered from CS class about a priority queue. With a regular queue, you feed objects to the queue, and when you query the queue for the next object, you receive objects in the order in which they were inserted. This is the "first in, first out" (FIFO) model, as opposed to the "first in, last out" model for stacks. Think of a stack as an airplane full of people: if you're the first on and you go to the back of the airplane — so as to optimize the loading time — then you'll be the last off, because it'll faster for people in front to get off first. Unfortunately, airplanes don't work like this in practice; for some reason, airlines stupidly load people in front rows first. This is probably because first class and business class are typically at the front of the plane.

A priority queue, however, slightly modifies how a regular queue works. Each object you feed into a queue is given a priority, and as you may have guessed, objects with higher priority come out of the queue first, even if they weren't necessarily the first object into the queue.

The priority queue is perfectly suited to tackling the shortest path problem. One of our assignments in CS class was precisely this: to find the shortest distance between two cities given a map of destinations and distances between certain destinations. The priority, in this case, is the distance: since you want the shortest path between two cities, shorter paths should bubble up to the top of the priority queue.

In solving this problem, you create a priority queue of partial paths. From the origin, you get a list of neighbors from that origin. Create partial paths from the origin to the neighbors, and feed them into the priority queue, using the distance as the priority. Then, take the first partial path from the queue (which will necessarily be the shortest partial path), and repeat. When you add a new city node to a partial path, you also modify the priority of that path by adding the existing distance and the distance to the new node. Using this method, you'll eventually find a path between your two cities that is guaranteed to be the shortest path.

Note that bigger numbers usually indicate higher priorities, whereas we want smaller distances to have higher priorities. In my app's source, I actually start from zero and subtract one each time I add a new node. Thus, the priority is the opposite (or the "additive reciprocal") of the distance.

This is exactly how my app works. In practice, the problem applied to Twitter is a bit simpler, because each step in the chain is the same distance in the virtual world (as opposed to potentially different distances between cities in the physical world). But to a priority queue, that's just a detail. My app simply adds one to the distance each time I add another node to the partial chain.

There are a couple other wrenches that Twitter throws into the equation. First, there's no guarantee that there actually is a path between any two given twitterers. Consider the case of two twitterers, each following each other but following no one else. If you tried to initiate a search for a path between one of those twitterers and a twitter outside their reciprocal circle, you'd end up searching through all of Twitter — or at least all of Twitter that's connected to your start twitterer.

Unfortunately, there's no way to solve this problem. It's impossible to know if a given twitterer is isolated from any part of Twitter. If this is the case, my app runs until it has encountered all nodes connected to the starting node. But with an estimated 880,000 people on Twitter, that'll be a long time. Given the exponential nature of Twitter, however, it's highly likely for there to be no isolated communities. It only takes one twitterer on Twitter to follow one member of that isolated group to make the community "connected" and my app able to find the connection.

To be fair, all people who follow more than 100 people aren't blowhards. I would put the bar closer to 500 or maybe 1000.

The other problem is with Twitter "blowhards". I'm tempted to call them "spammers", but that's a bit disingenuous. Twitter blowhards are those twitterers who follow a ridiculous number of people — sometimes into the thousands — that it's impossible for them to actually keep up with the tweets in their personal timeline. There's just no way. I follow 58 people and already I am overwhelmed with the Twitter backlog, sometimes.

While Twitter blowhards aren't an exception in theory, they are with Twitter in practice. When fetching a twitterer's list of people who they follow, by simply visiting "http://twitter.com/status/friends/%@.xml", where "%@" is the Twitter name of the twitterer in question, Twitter only returns the first 100. This means that I'm unable to access info on some connections. As an example, I follow Jonathan Rentzsch and he follows me as well, but you wouldn't know that if you downloaded his Friends XML file.

How does this affect my app? Well, it simply means that my app can only find the shortest path based on the available data. Calculating a path between Rentzsch and me means going through Bill Bumgarner instead of using the direct connection. Again, however, given the highly interconnected and exponential nature of Twitter, typically this only adds one more degree of separation than is actually necessary.

I've already mentioned how easy it is to get the list of people a given twitterer follows: simply visit "http://twitter.com/status/friends/%@.xml" with the twitterer's Twitter name in place of "%@". NSXMLDocument, a standard class in Cocoa starting with Mac OS X 10.4 Tiger, also makes it easy to extract info from that XML file. I don't have to do any text scraping whatsoever. It's too bad, though, that you have to know a twitterer's name and password in order to get the list of people who follow them. This makes it impossible to create two-way graphs of Twitter. It's also a strange restriction, given that you can still figure out a twitterer's followers by visiting their Twitter pages.

Optimization

Now you've got an app that finds the shortest path between two twitterers given the available information. Now what?

Well, these searches aren't trivial, and may take a while. So what can we do to speed things up?

There's one trivial optimization that can speed things up considerably. With Twitter, people are connected to so many other people that there's bound to be twitter loops, where one twitterer follows another, who follows another, who follows the first twitterer. Any twitterer chain that contains a loop is guaranteed to not be the shortest distance between two twitterers. This isn't to say that the priority queue will ever favor chains with loops. In fact, using this method, you'll never get a chain with a loop as the shortest path since that same chain without the loop would have been encountered first using the priority queue.

But these chains with loops still gum up the system. A chain of twitterers with distance 2 that contains a loop, such as me following @rentzsch who follows me again, will be prioritized in a queue over a chain that does not contain a loop, but which is nevertheless longer such as @buzz to @manton to @SenorDanimal to me.

Again, I want to emphasize that this doesn't mean the chain with the loop will end up being the winning chain. That same chain without the loop would have a shorter distance and thus a higher priority, so a chain between the two endpoints that doesn't have that loop would already have been found by that point. That the chain with a loop comes off the top of the queue simply means that there is no chain between the two endpoints that is shorter than the chain with the loop.

We don't like these chains with loops. And despite them being seemingly innocuous since they're never "winning" chains, they still cause the app to go slower because it wastes time using these chains at all. Preventing them is simple: stop them from getting into the queue in the first place. Keep a global list of nodes that have already been encountered, and don't add them on to the end of any more chains. (In practice, this means simply adding each Twitter name to an NSArray, and then calling containsObject: at every node that is encountered.) This prevents any given twitterer from appearing in any given path more than once.

But wait!, you might say. This excludes more than just simple loops! If I encounter @boredzo from @simX in one path, keeping a global list of nodes that have already been encountered will prevent @boredzo from appearing in any other path besides the one from @simX, even if @boredzo isn't in that path! For example, if I encounter @simX --> @boredzo, then @simX --> @timburks --> @boredzo will be barred from entering the queue!

Ah, but this is good. @simX --> @boredzo is shorter than @simX --> @timburks --> @boredzo. And since @boredzo is the end twitterer of both partial chains, we know that the set of subsequent chains that will be spawned from both of these chains will be identical, except for this initial part. So why not go with the shorter path anyway, and not make the app search the same paths twice over?

Are there any other optimizations that we can add? None that I can think of, at least given the limitations of a single computer. I can imagine a method of giving further priority over certain paths based upon their "proximity" to the ending node. For example, if I wanted to find a path to @rentzsch, I know that he has a strong following in the Mac community. So I could prioritize partial paths that end up in this community, like @schwa, @cbarrett, and @bwalkin. It's possible that they may not follow @rentzsch, but they have a higher likelihood of doing so since they're in his community, as opposed to someone like @sandrift, who might be more connected with geologists.

This kind of data, however, requires a "pre-scan" of the Twitter community, and given how large Twitter is, this seems infeasible not only in terms of time but also in terms of resources on disk. This would probably be more appropriate if my app were a web app, where any chain searches would add to the collective "knowledge" of the web app, in contrast to discrete desktop apps that can't communicate past searches or following data with each other.

How about optimization not in terms of graph theory, but in terms of structuring the program to minimize general bottlenecks? One of the main issues in this case is simply that it takes time to download the friends XML files. How do I eliminate that as a bottleneck? Two ways: multi-threading and caching.

Instead of downloading the XML files just as I need them, which necessarily requires the app to wait for the download, why not prefetch the XML files in the background beforehand so that they're ready to be used as soon as the need arises? My app does indeed do this, using new objects in Mac OS X 10.5 Leopard called NSOperation and NSOperationQueue.

When the app encounters a twitterer node, it needs to download the list of people that twitterer follows in order to add new chains to the partial queue. However, it doesn't use these chains immediately, because we're using a priority queue — adding a node to a chain increases its length and decreases its priority, so simply adding another node might mean that there's another, shorter chain that needs to be attended to first.

But we already have a list of the people who that person follows, and eventually we'll need the friends XML files for those people too! So when a new chain is added to the queue, the friends XML file for that last node is queued up to be downloaded. The beauty of this is that since the priority queue still uses a FIFO method for chains of the same priority, the prefetch operations will be initiated in exactly the order they'll be needed! The prefetch operations are just offset to be earlier than when the XML files are actually needed, but they're still in the same order. So there'll be no bottlenecks where an XML file being downloaded is not the one currently needed. Most of the time the XML files will be fully downloaded by the time they're needed, but if not, we're guaranteed to be currently downloading the XML file that we need the most.

I'm actually using another queue for the XML file prefetching, but this one is a simple queue, not a priority queue, since everything will already be in the order we need them. NSOperationQueue is also really handy, because it can take care of dependencies and automatically distribute operations across multiple threads. In this case, dependencies aren't an issue, since, as demonstrated earlier, the XML files are already in the desired order. (There could be a problem with detecting whether a given XML file download is completed or not, but I'll address how I handle that in a bit.) All I had to do was subclass NSOperation and create the self-contained code that downloads the required XML files. The only communication that these NSOperations have with other parts of the app are the initialization (where the name of the twitterer is passed to the NSOperation), and a notification that the NSOperation sends out if there's an error downloading the XML file.

As noted before, NSOperation and NSOperationQueue are Leopard-only, so this limits my app to run on Leopard only as well. I tried manually managing threads by spinning off a thread per XML download, but this created so many threads that it consistently crashed my Mac to the login screen. I also tried manually limiting the number of threads myself using a counter, but this posed problems of its own. (In fact, I implemented an activity window that monitors how many NSOperations are in the NSOperationQueue. If you're doing a search between two random twitterers from the public timeline, this number can easily reach 100,000, because the number of nodes that are encountered grows exponentially. No wonder that spawning one thread per download causes significant problems.)

By default, NSOperationQueue seems to allow 70 or so NSOperations to be running concurrently on my Mac, but in my experience that made the user interface of other apps go pretty slowly while the search is running. NSOperationQueue allows you to fine-tune this number, so currently my app limits the number of concurrent operations to 30, making its CPU usage stay consistent at about 50% (or 100% of one core if on a dual-core Mac). This makes it feasible to run a search in the background.

One other benefit of NSOperationQueue is that it allows me to cancel operations that are no longer needed. Consider the point in time where my app finally finds a chain between the two twitterers. But since XML prefetch operations are initiated one degree of separation before the files are actually needed, there will still be thousands of operations left in the queue that will continue to download XML files if the app remains open. NSOperationQueue can discard any operations that have not yet been initiated, and if you write your code correctly, you can even have currently running operations stop without completing.

In short, NSOperationQueue was just too good to pass up. So, yep, if you want to use my app, you'll have to shell out the $129 for Leopard or get a new Mac. :P

Question for the reader: how many times will a given XML file be used per search?

I mentioned that caching was another method of optimization. Each XML prefetch operation simply downloads the XML file to disk, where it is (virtually) instantaneously accessible to my app. But that's not the only reason why these XML files are stored on disk. If you complete one chain search and then initiate another, my app doesn't have to redownload the XML files for each node all over again.

Of course, if a twitterer follows some new people, or decides to de-follow some, my app won't know this because it found an existing XML file on disk and decided it didn't need to fetch the file from the internet again. This problem can be mitigated by checking the modification date of the XML file and discarding it if it's more than a week or so old. My app currently doesn't do this, but it's a modification that I'm definitely going to implement soon. (Besides, there's a workaround: you can just delete the cache files and this will force my app to redownload the XML files.)

Cache files for my app live in ~/Library/Application Support/Degrees of Tweetdom/ . I decided to house them here instead of in the "Caches" folder because they have a significant effect on how fast a search completes.

What's the disk usage of these cache files? Well, they can range anywhere from 4 KiB to 124 KiB or so each. I have a cache folder that currently contains just under 150,000 XML files and takes up 3.29 GiB on disk. Clearly this is not a trivial amount of space. But with disk space so plentiful and cheap, and with the cache files so easily removable, I find this is a small price to pay for an interesting, experimental app. (One other potential optimization that I'm considering is downloading the XML file and reformatting it so that it's simply a return-separated list of people that twitterer follows, and stripping out all the extraneous XML that's included. This has the potential to significantly cut down on disk space.)

I mentioned earlier that I didn't need to use the dependencies feature of NSOperation. In this case, it'd be difficult to check whether a given operation was finished without having to perform an expensive search through all currently queued operations (which easily gets into the tens of thousands). Instead, since the operations saves the XML file atomically ("atomically", not "automatically" — that is, it writes to a temporary file, and once it's finished writing, it moves the file to its correct location), I can simply check whether that XML file exists yet using NSFileManager. If the file does exist, then it's ready to be used (guaranteed by the atomic write). If not, I tell the app to wait for a second and then check again. If it doesn't find the XML file in 10 seconds, it times out, discards that chain, and moves on to the next chain from the priority queue.

Maps

Well, if you've gotten this far, I'm sure I've whet your appetite for something even cooler. Yes, folks, if you can search for a chain, then you can do searches for multiple chains, and from multiple chains you can make a map. So why not have my app automate all that, too, and build maps of Twitter users, too?

That's what I said. And my app does. Heehee, it's so much fun!

Again, though, I want to direct your thought impulses to the nature of the problem. How does one implement a map creation program like this?

Well, we've got the fundamental part down: we can find the shortest path between any given two twitterers with enough time and disk space (with the information available). Creating a map simply means that you're finding multiple chains between multiple pairs of twitterers and graphing the results on the same page.

That's easily done. Given a list of twitterers, simply create a list of all possible pairs between themselves, and then find the shortest chain between each pair. Then graph the results. Simple, no?

Well, sure. But there are all sorts of things that crop up: first off is the fact that we only have information on who people are following, not who is following someone. This means it's a directed graph; relationships are only one way: you can only follow someone else. (That you can be followed is simply a side-effect of them following you.)

The implication is that if you are following someone, that person does not necessarily follow you back. The relationship is not commutative or abelian; the shortest path from @mdmunoz to @command_tab (@mdmunoz --> @bwalkin --> @command_tab) is not the same as the shortest path from @command_tab to @mdmunoz (@command_tab --> @timburks --> @mdmunoz). That's why it's a bit misleading to talk about the "shortest path between two twitterers".

The upshot is that if you're graphing the shortest paths between three twitterers, you actually need to perform 6 searches, not 3. Graphing four twitterers means performing 12, not 6. You're doing twice as many searches as you might think you need to do. For random twitterers, this is a significant amount of time.

Now think about how you would actually do the searches. Would you just perform them one after another? If so, you might completely pass up one chain between two twitterers while looking for a chain between two others, only to re-follow the very same thousands of partial chains later to find that chain you passed up. For example, searching for a chain from @buzz to @boredzo will pass through @buzz to @command_tab on the way. It'd be much faster to specify a single start twitterer and an array of end twitterers, and capture all of those chains in one path. Then start a new search with a different start twitterer. This reflects the current state of my app right now.

But there are further optimizations. What happens if one chain between two twitterers is completely contained within a chain between two other twitterers? @rentzsch --> @iTod, for example, is completely contained within the chain @boredzo --> @rentzsch --> @iTod --> @gruber. How do you detect that? For each chain, do you check whether any other twitterer pair also exists in that chain? It's possible that would cause the app to go slower than a dumb search. I don't know the answer to this question.

One technique I'm considering is to monitor chain searches for other twitterers in the search. When an end twitterer is encountered, add all twitterers in that chain to an array associated with that end twitterer. When that twitterer is again the end twitterer in another search, you can effectively "short-circuit" the search if you encounter any of the other twitterers that existed in that one chain.

For example, let's say I'm searching for a path from @boredzo --> @gruber, and I find @boredzo --> @rentzsch --> @iTod --> @gruber. I add @boredzo, @rentzsch, and @iTod to an array associated with @gruber. Now, if I'm searching for a chain from @mdmunoz to @gruber, I can stop if I find a chain from @mdmunoz to @boredzo or to @rentzsch or to @iTod, since I already know a chain from them to @gruber.

But does this preserve the property of shortest chains? If I encounter one of those intermediate nodes, I now have a valid chain, but do I have the shortest chain? Can I guarantee that using this method? (Any actual insights into these questions would be greatly appreciated.)

Now that I've made your brain weep considering those questions, consider this one: how do you even graph the results? Graphing a chain is easy: it's just nodes on a line. But for a map, how do you guarantee that nodes don't overlap one another on the graph? How do you minimize the crossing of arrows indicating followship? How do you minimize the space taken up by the graph? Do you start with a given twitterer in the graph and draw radially outward? Do you start with a given chain in the graph and draw those nodes in a line? How do you proceed to lay out the other nodes?

This problem in itself gave me headaches and was potentially larger a hurdle than I could handle. Luckily, some great minds over at AT&T have tackled this problem and have an open source graph visualization solution in graphviz. With graphviz, you simply describe the connections in a graph, and graphviz does all the layout for you. Awesome.

Even better, graphviz is licensed under the Common Public License, which allows you to use the code in both open source and proprietary projects, and only requires you to divulge source changes made to graphviz, not to anything it links to. It's a liberal license, compared to the draconian nature of the GPL which bars you from linking to GPL code in proprietary apps. (You still can use GPL code in proprietary apps provided you're simply launching the app and running it rather than linking to it in your code, but that significantly limits its usefulness. Still, GPL-licensed code is not all bad; it's been used in Mac OS X, Acquisition, and Airfoil, all of which are proprietary, commercial products.)

graphviz is pretty cool in that it offers a few types of graphs (hierarchical, radial, energy minimized, and circular), and it allows you to modify many attributes of the appearances of nodes, edges, or the graph. Unfortunately, it doesn't allow you to fine-tune the graph by moving the nodes around slightly. Nor does it allow you to modify the position of node labels, so you either have to use an annoying hack to define a sub-graph for each node (in which case the label appears unacceptably far away from the node), or you have to disable node names entirely. My app disables node names and uses the Twitter profile images instead.

The great thing about graphviz, though, is that you simply produce a text file (file extension .dot) that defines the relationships between various nodes, and then the graph gets rendered by graphviz. Then you export that picture to whatever format you want. This means that you can easily change the .dot file and re-render the graph to modify its appearance.

So let's get to some maps! Maps of Twitter are so much fun, it's ridiculous. I'm practically like a little kid in a candy store waiting for these maps to be produced by my app.

Here's the first Twitter map ever made (that I know of, anyway). I created this by manually searching for the chains between each pair of seven twitterers, and then I manually copied the profile pictures and created the graph in OmniGraffle. It's decent, although it's kind of laid out haphazardly.

The seven nodes that are outlined in double-red lines were the main nodes, and the other nodes are intermediary nodes required to get between the main nodes. Note how with 7 main nodes, 18 intermediary nodes are brought into the mix. It's also interesting how the intermediary nodes stayed within the Twitter Mac community, since the main nodes are all also within the Twitter Mac community.

Let's see how graphviz lays out this same map:

First Twitter Map Ever, Rendered by graphviz

Here the lines are much smoother, and the total length of lines is minimized by strategically placing as many nodes close to each other as possible when appropriate. graphviz uses the concept of "rank", where nodes at the top of the graph are more important and connected and nodes near the bottom are less important and less connected in this graph. This graph represents graphviz's "hierarchical" graph style.

This next graph is extremely fascinating to me:

First Twitter Map Ever, rendered by graphviz, excluding simX

Can you figure out why?

The map above is the same map as before, but with a "simX" exclusion policy in place. This means that no paths found by my application are allowed to go through me. I anticipated that the change would radically alter the graph — the original seven nodes that I picked were all people whom I follow. Because of that, I thought that many of the shortest chains would go through me. In fact, if they follow me, then the shortest chain to any other main node would be two.

But if you compare the previous two graphs, they're largely the same! The positions of the various nodes have changed, but the relationships largely haven't. Besides me, the only other twitterers that have been removed from this graph are @bbum and @SenorDanimal. There are no new nodes compared to the old graph, either. This illustrates just how interconnected the Mac community is on Twitter. In fact, you could potentially define an "interconnected" parameter based upon how much the relationships and intermediary nodes change if you remove a given node from a graph. If I graphed one twitterer from the Mac community, one from my mini-geology community, and a real life friend, removing me from the graph would probably significantly change the relationships and nodes on the graph (and the time taken to find the shortest paths).

Here's a graph of a different set of main nodes, not all in the Mac community:

Second Twitter Map Ever, rendered by graphviz

This one was pretty interesting because two people whom I follow, @ivanov, a real-life friend, and @anwnn, a friend with whom I interact with on Twitter, both follow and are followed by @stilldavid. I had made a one-degree-of-separation friend out of a previously three-degrees-of-separation-friend without even knowing it.

It's revelations like this that make this app so interesting to me. And it involves mathematics, computer programming that's within my reach, friends, the internets — my cup overfloweth!

Degrees of Tweetdom Binary, Source, Usage

Yeah, yeah, yeah, I know, you want your grubby little hands on this app. Fine. With one stipulation: I can haz ur grafs? Just link them to me (@simX) on Twitter or in the comments to this entry; I don't care if I don't know anybody on the graphs, I just want to see them!

You're also welcome to use and extend the app as you see fit, but again, I would like to hear about what you're doing with my app and see any improvements you've made.

So without further ado, u can haz linkz! But please, keep reading for a few things you need to know.

Degrees of Tweetdom 0.3 (now with mapping capability!)
Degrees of Tweetdom 0.3 source
pixelglow's Mac OS X GUI for graphviz, version 1.13 (v16)

Now for a word from our sponsors about how to use Degrees of Tweetdom.

Both the "Twitterer Chain Finder" and "Twitterer Mapper" windows will appear on startup. You can use the former to find the shortest path between two twitterers, or you can use the latter to create the files necessary to generate a map, but not both at once. The mapper uses the chain finder window. (This'll be changed in a subsequent version.)

In the mapper window, use commas to delimit each twitterer you want on the graph, and each twitterer you want to exclude from the graph. Don't put spaces after your commas, use just a comma between each twitterer's Twitter name.

Don't worry about capitalization of Twitter names. Degrees of Tweetdom will correct that for you automatically. However, it can't correct typos, so take a look at the log for the Twitterer Chain Finder window (not the Twitterer Mapper window) to make sure you haven't made any typos — it'll give you a summary of nodes from which it's starting, nodes at which it's ending, and node which it's excluding.

Graphs will not be displayed in Degrees of Tweetdom. DoT creates a folder on your Desktop called "DoT Twitterer Map", inside of which will be all the profile images you'll need and a file named "twitterer-map.dot". This latter file describes the relationships between all the nodes. You'll need to open it in graphviz to get an image. Be aware that currently, DoT always writes to this same location, so if you want to save the .dot file that describes a graph's relationships, move it to a different location before creating a new map!

Please note, I highly recommend using the PPC-only, version 1.13 (v16) of pixelglow's GUI for graphviz. This is because it's the latest version that seems to recognize custom images to use for nodes on the map. There's a package for the universal binary of GUI version 2.18, which includes tools to use from the command line, but they generate blank nodes with custom images. I'm not sure why this is. Rosetta is your friend, however, and it still runs well and reasonably fast on my MacBook Core 2 Duo.

Also, pixelglow's GUI for graphviz can export to a wide variety of formats (and I suspect that the command-line tool can, too). png is a good format for the web, but if you want to fine-tune the graph, you can export to .ps, .svg, .pdf, and other vector formats. However, none of the vector formats seems to understand custom images, either, and Degrees of Tweetdom suppresses node labels, currently. I've found that OmniGraffle can natively open .dot files, so if you modify the .dot file that Degrees of Tweetdom creates to use labels (open the .dot file with a text editor), you can get the node labels to show up in OmniGraffle and you can re-create the graph manually yourself. There seems to be no way to get OmniGraffle to recognize custom images as well as understand node relationships and reproduce exactly the graph that graphviz does, unfortunately.

If anybody finds otherwise, please, please let me know.

I think that's it! I hope you enjoyed this run-through of my thought process as I was creating this program, as well as any maps that you create. Have fun mapping Twitter!

Technological Supernova Tips Older Newer Post a Comment

Technological Supernova > Tips >

All Roads Lead to Jack

Sunday, 2008-04-20; 17:27:01