ClickToFlash Killers

Friday, 2009-10-02; 02:38:49



One of the greatest features of ClickToFlash, in my opinion, is the ability to load H.264 files from YouTube in QuickTime, rather than having to load the Flash plugin. Just searching for ClickToFlash on Twitter reveals a number of users who absolutely love the feature. It’s amazing what this one feature can do to your CPU load, fan volume, and battery life, just by loading videos in a non-horribly-written plug-in. You might even say that ClickToFlash has significantly contributed to the fight against global warming!

If you happen to be an Adobe Flash plug-in developer and you’re reading this, seriously, WHAT THE FUCK? The Flash plug-in has been on the Mac for around a decade, and we’ve been promised better performance time and time again. Yet, still, the Mac Flash plug-in uses around 75% of the CPU under normal circumstances, compared to 10-25% of the CPU that the QuickTime plug-in typically uses and the good performance of the Windows Flash plug-in. The only logical conclusion is that your code SUCKS ASS.

The problem is, the code to load the YouTube H.264 files is entirely custom. What if we wanted to do the same thing for a different video site, like, say, Vimeo? We’d have to write completely new code to account for this.

Well, I want to tackle this problem sooner rather than later. Eliminating more bits of Flash from the web is cathartic. There has been sparse talk about how to implement this (see the ticket for the feature request), but no concrete plans. (@atnan has already figured out a general method for getting H.264 videos from Vimeo.)

I’ve been thinking about this problem for a while. But before figuring out an implementation proposal, we have to think about the goals of such a feature:

  1. Easily-supported code. This means that we need to have all site plug-ins go through the exact same mechanism. This includes the current custom YouTube support. We want only one mechanism to maintain. If there are bugs, we can fix it in one place, and all site plug-ins will reap the fix.

  2. User-extensible. We want users to be able to create their own site plug-ins and share them with the world, without having to learn Objective-C or go through the horrible UI that is git. Part of this means site plug-ins that are relatively easy to create, with little programming knowledge, if possible.

  3. Flexible. Video sites will do weird things. With YouTube, we already have to do HTML scraping for embedded videos. And we can’t rely on the flash variables that YouTube videos include, because sometimes they indicate the existence of a H.264 file that just isn’t there! So we have to test for file existence at a certain URL. In short, the mechanism needs to be able to support whatever video site users think of throwing at it. If we need to add a feature, we want to be able to do so without breaking existing site plug-ins.

  4. Secure. One possible example of an implementation is simply to have site plug-ins be executables which return a string for the URL of the video file. But think about how much of a security hole this is: the executable could happily delete your home folder! We don’t want that possibility with our site plug-ins. That can conflict with our goal of flexibility, but we want to achieve a happy medium between the two.

Today, on Twitter, I mentioned that I wanted to get this feature moving, and @boredzo came up with a preliminary implementation: define site plug-ins as a dictionary file. His proposal lacked some flexibility even for YouTube, but it was a good jumping-off point. I spent a few hours looking through ClickToFlash’s YouTube code and created a dictionary file format that I think can reasonably satisfy all of the above four goals.

Because “site plug-ins” sounds boring, and ClickToFlash already is a “plug-in”, having plug-ins for a plug-in is kind of weird. The name “extension” is already used by Firefox (let’s save a rant about Firefox for another day entirely). So I think we need a new name for these site plug-ins.

In the spirit of the cheeky copy on the ClickToFlash website, I’d like to dub this feature “Flash killers”, or “killers” for short. Yeah, yeah, I know, some of you will disapprove, but it’s one of the little things I do to keep me sane as a programmer.

Anyway, I’d like to walk you through three sample killers: two that replicate the existing functionality of ClickToFlash for YouTube, and one that would introduce H.264 support for Vimeo. Note that not a single line of Objective-C code has been written to support killers, at this moment in time. The whole reason for this post is to receive constructive criticism on the proposal before implementation.

For that reason, I ask that you read this entire post before commenting. I’ll explain the decisions behind everything.

As mentioned before, killers are simply .plist files, or a dictionary of objects and keys that define how ClickToFlash will construct the URL for a video file. Here’s the root view of a killer for YouTube:

“Killer Name” and “Video Result MIME Type” are pretty self-explanatory.

Here’s the same killer showing the contents of the “Matching Regular Expression” and “Rejecting Regular Expression” dictionaries:

There are three things that we use in the current ClickToFlash code to check for YouTube videos: the URL of the page that you’re on (the one you see in Safari’s address bar), the URL of the source flash file (the “.swf” file), and the flash variables themselves (which can contain the URL). For embedded YouTube videos, we need to support regex matching of each of these URLs. This killer does not support embedded YouTube videos; I’ll explain more about that later. This one just supports actual YouTube pages.

So for non-embedded YouTube, we just need to match “youtube.com” or “youtube-nocookie.com” at the start of the page URL with an optional “http://” and an optional “www.” (Please note, I am a regular expression rookie, so if you see any mistakes, please let me know.)

The “Rejecting Regular Expression” is the opposite — if the URL matches the regular expression in this dictionary, ClickToFlash won’t attempt to load the video file via this killer. Because standard YouTube pages will also match the regular expression for source URL and Flashvars, which we use to check for embedded YouTube videos, we don’t want two killers matching for the same video. (It actually probably doesn’t make that big of a difference if two killers match a given video, as the first killer to be loaded will just take precedence.)

Here we see support for the variants of YouTube. Remember, YouTube has standard H.264 and HD H.264 versions available. We want to support both, and add support for additional variants from other sites without having to create additional killers (which would cause problems, because the regular expression matching for the killers would be identical, at least in the case of YouTube).

Here, “Display Name” specifies what text will be displayed in the ClickToFlash view if the URL successfully loads. “Return URL” also specifies the steps to construct the URL and the return URL string (the format of which we’ll get to in a moment.)

Here we can see how we construct the URL of the MP4 file for YouTube. ClickToFlash will support a limited number of step “types”, including flash-var-retrieve, regular-expression-match, download-resource, and because of this format, we can add support for additional step types in the future if deemed necessary.

Each step can contain three objects: “Type”, “Victim”, and “Action”. Type indicates what ClickToFlash will do in that step, “Victim” indicates the string to perform the step on, and action represents the specific form of the type of step. For flash variable retrieval, we don’t need an action, because this step simply retrieves the string for a given flash variable (specified in “Victim”). Other steps do require an action: for example, a regular expression match actually needs a regular expression.

In addition, each step always returns a string. If a certain step returns nil, then we’ll cancel out of the killer. The result of each step is stored in an array for later retrieval. So in these two steps, we retrieve the video_id and t flash vars.

In the “Return URL” object, we see how we refer to the results of the previous steps. “%3A” and “%4A” represent the results of the steps in the killer. Why start with “%3”? Because “%0” is reserved for the page URL, “%1” is reserved for the Flash source URL, and “%2” is reserved for the Flashvars string. The trailing character of the entity used for retrieving the results of a single step is to allow you to retrieve results from a different variant, which we see will come handy in a sec.

In this pic, you can see that the YouTube HD variant has no steps. How can that be?! Well, we already retrieved the required strings in the previous variant, so we just refer to them again in the return URL. “%3A” and “%4A” will return the same video_id and t flash vars as it did in the previous variant. If we had a third variant, we could refer to the steps of the previous variant using “%3B” and “%4B” instead.

Why support this? Because it saves processing time, and it will save network time in the case that you need to download an external resource, and need to refer to the same resource in two different variants. You don’t want to download the same HTML file twice to scrape (as we might do for this killer), and you don’t want to spend the CPU time to scrape it twice, either.

Now let’s move on to embedded YouTube. Remember how I said we didn’t want proper YouTube page videos to match for two different killers? Here we use the “Rejecting Regular Expression” to that effect. It’ll match “youtube.com” or “youtube-cookie.com”, but if those strings appear in the actual webpage URL (as it does for standard YouTube videos), then this killer will reject the match.

Here’s how we actually construct the URL. Remember when I said we had to do HTML scraping? Here’s where we do it. Our first step is a regular-expression-match on the source URL of the Flash file, to extract the video_id parameter. Then we actually have to download the HTML source of the corresponding YouTube page, which occurs in the second step. In the third step, we do a regular expression match against the HTML source of the previous step, searching for the flash arguments. The last step searches the results of the previous regular expression result to match the “t” flash var, which is the last item we need to construct the URL.

Note that victims do not need to be single entities, which are the results of previous steps. You can construct strings that include the previous step result as a subset, if necessary.

Also, note the regular expression matching here. (?< ) is a “look behind” parameter, forcing a string to have a certain other regular expression come immediately before the string. Similarly, (?= ) is a “look ahead” parameter.

How does the HD variant for embedded YouTube videos work? Easy-peasy! We’ve already retrieved the “video_id” and “t” flash vars from the previous variant! We don’t need to do any work here. Just construct the return URL using the results of the previous variant (which in this case uses “fmt=22” to get the HD version rather than “fmt=18”)!

There’s one more thing you need to be aware of: in ClickToFlash’s current code, we need to actually check for the existence of a file at the constructed URL in order to make sure that one exists. YouTube sometimes lies, and gives you “video_id” and “t” flash vars even though there’s no H.264 or HD version. But this can be handled in ClickToFlash code — ClickToFlash will just check for the existence of a file at the return URL for you!

I hope you agree, this step-by-step method for constructing URLs is sufficiently powerful to allow for flexibility for the various websites out there. Because the number of steps isn’t limited, if you need to download two external resource files, that’s absolutely supported! If you need to do a vast number of regular expression matches and not just one, that’s supported! And given the flexibility of the “Type” parameter, we can support future step types, like CSS selectors, XPath queries, and possibly even AppleScripts!

How does this live up to our original four goals?

  1. Easily maintainable code. A defined set of steps and matching regular expression allows us to write one Objective-C class for every killer.

  2. User-extensiblity is pretty evident. Also, I believe the step-by-step mechanism makes it easier for novices to understand how a killer is set up, and attempt to make one of their own if they so please.

  3. Flexibility is also pretty evident: this mechanism has been designed from the ground-up to support all the quirks of YouTube, and allows for future-proofing by adding additional step types, so I’m confident that it can handle the majority of the video sites out there.

  4. Security: with this proposal, all that a malicious user can do is fetch flash variables, download resources, and match regular expressions on pieces of text. I think that pretty much avoids any potential security holes that such a plug-in mechanism may have. The only exception would be AppleScripts, since they could delete files if they so wanted, but we can discuss whether we want to add support for those or not.

What are the deficiencies in this proposal? I can see a number of them.

  1. It’s verbose. I did this on purpose to aid “readability” of the killers, but I can see how it might be annoying to set up.

  2. Kind of related to the previous deficiency, the mechanism for referring to results of previous steps is kind of awkward. What if someone had a large number of steps? Or what if they wanted to use a victim such as “http://asdf.com/?%4A=something&%3A=something-else”, and not have the “%4A” match the result? Would escaping the entity work reliably?

  3. Localizability. How to localize the keys in the dictionary for other languages, so that non-English speakers can still create killers and get them to run under any system language, including English? Not sure how easy it will be with this mechanism.

I’m not sure the answers to these questions, and maybe they’re things that we don’t really need to worry about. In any case, if you’d like to play around with my Dictionary files, you can access them here:

I’ve even added one that should support Vimeo’s H.264 files!

And now that you’ve read through the whole thing, comment away! I welcome any and all discussion, since this is probably the single biggest feature I’ve attempted to undertake in ClickToFlash, and I want to get it right the first time.


Technological Supernova   Software Development   Older   Newer   Post a Comment