Archive for May, 2010
It seems that WebM captured the interest of many that were looking for a potential escape from the HTML5 video deadlock (h264 or theora); Google managed to create a potentially suitable alternative, that seems to be relatively risk-free in terms of patents, while providing reasonable overall quality (especially for the HD video range; at lower bitrate and resolution still needs substantial work). This reignited the flash/html5 war, spurred by Apple and its ban of everything Flash or Flash-like. I have read lots of comments ranging from the “flash will be with us forever” to “flash is already dead”. The dichotomy presented is false, for several reasons; I will try to highlight a few of the comments I received, along with my thoughts:
Flash player is on 98% of the internet-connected devices. While this may be true (I doubt it, as most smartphones are actually internet-connected, and Flash does not work there, or is such a sad joke that it should be left for when the kids are asleep), it does not work on the majority of new internet devices, that are either pads, ARM-based devices or systems where Flash has not a good enough performance. Google recently mentioned that they activate 100000 Android devices per day, Apple is more or less in the same range, and there is still no Flash there (t will be – but I still suspect that the experience will not be that good to make it usable). The Flash player is still slow and complex, and unless the technology changes radically, I suspect that it will be still quite slow (unless for single-window, tailored small games as Kongregate demonstrated recently).
Video is important, and H264 is the best thing that we have. H264 is a very good standard, and properly implemented (as an example, within the x264 encoder) is really among the state of the art. But – a small and unscientific comparison I performed yesterday seem to indicate that actually WebM is fast and simple, as previous claim by On2 indicated. The simplicity of the transforms and the linear structure should lend well to an embedded implementation, that is actually just a specialized coding on a collection of onboard DSPs.
The reality is that most content providers are requiring a form of DRM – be it effective or not – because this way they have a legal way to ask damages, using the DMCA or similar legislation that were introduced around the world. Yes, most current DRM schemes are not designed to be effective, but to provide a legal instrument. That’s why some video vendors insist that HTML5 is not suited for video distribution: because it does not have a protection form, that (even if easily avoidable) provides a tool to ask for damages and external authority control. I am quite confident that sooner or later, someone will introduce a DRM option to provide optional content protection; at that point, Flash or Silverlight will provide no substantial technical or business advantage, and will be on the contrary disadvantages, especially in up-and-coming platforms.
[I have updated the post below, with some considerations from the past similar initiative by Sun (now Oracle) of the Open Media Commons, that designed a codec based on enhancement of H261+]
There is a substantial interest on the recently released WebM codec and specification from Google, the result of open sourcing the VP8 SDK from the recently acquired company On2. It clearly sparked the interest of many the idea of having a reasonably good, open and freely redistributable codec for which the patents are freely licensed in a way that is compatible with open source and free software licensing; at the same time, the results of initial analysis by Dark Shikari (real name Jason Garrett-Glaser, the author of the extraordinary x264 encoder) and others seem to indicate that the patent problem is actually not solved at all.
I believe that some of the comments (especially in Dark Shikari’s post) are a little bit off mark, and should be taken in the correct context; for this reason, I will first of all provide a little background.
What is WebM? WebM is the result of the open sourcing of the VP8 encoding system, previously owned by On2 technologies. It is composed of three parts: the bitstream specification, the reference encoder (that takes an uncompressed video sequence and creates a compressed bitstream) and the reference decoder (that takes the compressed bitstream and generates a decoded video stream). I stress the word “reference” because this is only one of the possible implementations; exactly like the H264 standard, there are many encoders and decoders, optimized for different things (or of different quality). An example of a “perfect” encoder may be a program that generates all possible bit sequences, decodes them (discarding the non-conforming ones), compares with the uncompressed original video and retains the one with the highest quality (measured using PSNR or some other metric). It would make for a very slow encoder, but would find the perfect sequence; that is, the smallest one that has the highest PSNR; on the other hand, decoders differ in precision, speed and memory consumption, making it difficult to decide which is the “best”.
Before delving into the specific of the post, let’s mention two other little things: to the contrary of what is widely reported, Microsoft VC1 never was “patent free”. It was free for use with Windows, because Microsoft was assumed to have all the patents necessary for its implementation; but as VC1 was basically an MPEG4 derivative, most of the patents there applies to VC1 as well, plus some other that were found later for the enhanced filtering that is part of the specification (and that were also part of the MPEG4 Advanced profile and H264). It was never “patent free”, it was licensed without a fee. (that is a substantially different thing – and one of the reasons for its disappearance. MS pays the MPEG-LA for every use of it, and receives nothing back, making it a money-loss proposition…)
Another important aspect is the prior patent search: it is clear (and will be evident a few lines down) that On2 made a patent search to avoid specific implementation details; the point is that noone will be able to see this pre-screening,to avoid additional damages. In fact, one of the most brain damaged things of the current software patent situation is the fact that if a company performs a patent search and finds a potential infringing patent it may incur in additional damages for willful infringement (called “treble damages”). So, the actual approach is to perform the same analysis, try to work around any potential infringing patent, and for those “close enough” cases that cannot be avoided try to steer away as much as possible. So, calling Google out for releasing the study on possible patent infringement is something that has no sense at all: they will never release it to the public.
So, go on with the analysis.
Dark Shikari makes several considerations, some related to the implementation itself, and many related to its “patent status”. For example: “VP8’s intra prediction is basically ripped off wholesale from H.264″, without mentioning that the intra prediction mode is actually pre-dating H264; actually, it was part of Nokia MVC proposal and H263++ extensions published in 2000, and the specific WebM implementation is different from the one mentioned in the “essential patents” of H264 as specified by the MPEG-LA.
If you go through the post, you will find lots of curious mentions of “sub-optimal” choices:
- “i8×8, from H.264 High Profile, is not present”
- “planar prediction mode has been replaced with TM_PRED”
- “VP8 supports a total of 3 reference frames: the previous frame, the “alt ref” frame, and the golden frame”
- “VP8 reference frames: up to 3; H.264 reference frames: up to 16″
- “VP8 partition types: 16×16, 16×8, 8×16, 8×8, 4×4; H.264 partition types: 16×16, 16×8, 8×16, flexible subpartitions (each 8×8 can be 8×8, 8×4, 4×8, or 4×4)”
- “VP8 chroma MV derivation: each 4×4 chroma block uses the average of colocated luma MVs; H.264 chroma MV derivation: chroma uses luma MVs directly”
- “VP8 interpolation filter: qpel, 6-tap luma, mixed 4/6-tap chroma; H.264 interpolation filter: qpel, 6-tap luma (staged filter), bilinear chroma”
- “H.264 has but VP8 doesn’t: B-frames, weighted prediction”
- “H.264 has a significantly better and more flexible referencing structure”
- “having as high as 6 taps on chroma [for VP8] is, IMO, completely unnecessary and wasteful [personal note: because smaller taps are all patented ]“
- “the 8×8 transform is omitted entirely”
- “H.264 uses an extremely simplified “DCT” which is so un-DCT-like that it often referred to as the HCT (H.264 Cosine Transform) instead. This simplified transform results in roughly 1% worse compression, but greatly simplifies the transform itself, which can be implemented entirely with adds, subtracts, and right shifts by 1. VC-1 uses a more accurate version that relies on a few small multiplies (numbers like 17, 22, 10, etc). VP8 uses an extremely, needlessly accurate version that uses very large multiplies (20091 and 35468)”
- “The third difference is that the Hadamard hierarchical transform is applied for some inter blocks, not merely i16×16″
- “unlike H.264, the hierarchical transform is luma-only and not applied to chroma “
- “For quantization, the core process is basically the same among all MPEG-like video formats, and VP8 is no exception (personal note: quantization methods are mostrly from MPEG1&2, where most patents are already expired – see the list of expired ones in MPEG-LA list)”
- “[VP8 uses] … a scheme much less flexible than H.264’s custom quantization matrices, it allows for adjusting the quantizer of luma DC, luma AC, chroma DC, and so forth, separately”
- “The killer mistake that VP8 has made here is not making macroblock-level quantization a core feature of VP8. Algorithms that take advantage of macroblock-level quantization are known as “adaptive quantization” and are absolutely critical to competitive visual quality” (personal note: it is basically impossible to implement adaptive quantization without infringing, especially for patents issued after 2000)
- “even the relatively suboptimal MPEG-style delta quantizer system would be a better option. Furthermore, only 4 segment maps are allowed, for a maximum of 4 quantizers per frame (both are patented: delta quantization is part of MPEG4, and unlimited segment maps are covered)”
- “VP8 uses an arithmetic coder somewhat similar to H.264’s, but with a few critical differences. First, it omits the range/probability table in favor of a multiplication. Second, it is entirely non-adaptive: unlike H.264’s, which adapts after every bit decoded, probability values are constant over the course of the frame” (probability tables are patented in all video coding implementations, not only MPEG-specific ones, as adapting probability tables)
- “VP8 is a bit odd… it chooses an arithmetic coding context based on the neighboring MVs, then decides which of the predicted motion vectors to use, or whether to code a delta instead” (because straight delta coding is part of MPEG4)
- “The compression of the resulting delta is similar to H.264, except for the coding of very large deltas, which is slightly better (similar to FFV1’s Golomb-like arithmetic codes”
- “Intra prediction mode coding is done using arithmetic coding contexts based on the modes of the neighboring blocks. This is probably a good bit better than the hackneyed method that H.264 uses, which always struck me as being poorly designed”
- residual coding is different from both CABAC and CAVLC
- “VP8’s loop filter is vaguely similar to H.264’s, but with a few differences. First, it has two modes (which can be chosen by the encoder): a fast mode and a normal mode. The fast mode is somewhat simpler than H.264’s, while the normal mode is somewhat more complex. Secondly, when filtering between macroblocks, VP8’s filter has wider range than the in-macroblock filter — H.264 did this, but only for intra edges”
- “VP8’s filter omits most of the adaptive strength mechanics inherent in H.264’s filter. Its only adaptation is that it skips filtering on p16×16 blocks with no coefficients”
What we can obtain from this (very thorough – thanks, Jason!) analysis is the fact that from my point of view it is clear that On2 was actually aware of patents, and tried very hard to avoid them. It is also clear that this is in no way an assurance that there are no situation of patent infringements, only that it seems that due diligence was performed. Also, WebM is not comparable to H264 in terms of technical sophistication (it is more in line with MPEG4/VC1) but this is clearly done to avoid recent patents; some of the patents on older specification are already expired (for example, all France Telecom patents on H264 are expired), and in this sense Dark Shiraki claims that the specification is not as good as H264 is perfectly correct. It is also true that x264 beats the hell on current VP8 encoders (and basically every other encoder in the market); despite this, in a previous assessment Dark Shiraki performed a comparison of anime (cartoon) encoding and found that VP7 was better than Apple’s own H264 encode – not really that bad.
The point is that reference encoders are designed to be a building block, and improvement (in respect of possible patents in the area) are certainly possible; maybe not reaching the level of x264 top quality (I suspect the psychovisual adaptive schema that allowed such a big gain in x264 are patented and non-reproducible) but it should be a worthy competitor. All in all, I suspect that MPEGLA rattling will remain only noise for a long, long time.
Update: many people mentioned in blog posts and comments that Sun Microsystem (now Oracle) in the past tried a very similar effort, namely to re-start development of a video codec based on past and expired patents, and start from there avoiding active patents to improve its competitiveness. They used the Open Media Commons IPR methodology to avoid patents and to assess patent troubles, and in particular they developed an handy chart that provides a timeline of patents and their actual status (image based on original chart obtained here):
As it can be observed, most of the techniques encountered in the OMV analysis are still valid for VP8 (the advanced deblocking filter of course is present in VP8, but with a different implementation). It also provides additional support to the idea that On2 developers were aware of patents in the area, and came out with novel ideas to work around existing patents, much like Sun with its OMV initiative. In the same post, the OMV block structure graph includes several “Sun IPR” parts, that are included in the OMV specification (the latest version available here (pdf) – the site is not updated anymore) and that maybe may be re-used, with Oracle explicit permission, in WebM. And to answer people asking for “indemnification” from Google, I would like to point my readers to a presentation of OMV and in particular to slide 10: “While we are encouraged by our findings so far, the investigation continues and Sun and OMC cannot make any representations regarding encumbrances or the validity or invalidity of any patent claims or other intellectual property rights claims a third party may assert in connection with any OMC project or work product.” This should put to rest the idea that Sun was indemnifying people using OMV, exactly like I am not expecting such indemnification from Google (or any other industry player, by the way).
Marten Mickos (of MySQL fame) once said that “people spend time to save money, some spend money to save time“. This consideration is at the basis of one of the most important parameter for most OSS companies that use the open core or freemium model, that is the conversion rate (the percentage of people that pays for enterprise or additional functionalities, versus the total amount of users). With most OSS companies reaching less than 0.1%, and only very few capable of reaching 1%, one of the obvious goals of CEOs of said open source companies is to find a way to “convert” more users to paying for services, or to increase the monetization rate.
My goal today is to show that such effort can have only a very limited success, and may be even dangerous for the overall acceptance of the software project itself.
Let’s start with an obvious concept: everyone has a resource at his/her disposal, namely time. This resource does have some interesting properties:
- it is universal (everyone has time)
- it is inflexible (there are 24 hours in a day, and anything you can do will not change it)
- efficiency (work done in the unit of time) does have a lower bound of zero, and an higher bound that depends on many factors; efficiency can vary by one order of magnitude or more.
Another important parameter is the price per hour for having something done. At this point, there is a common mistake, that is assuming that there is a fixed hourly rate, or at least a lower bound on hourly rate. This is clearly wrong, because the price per hour is the simple ratio between what someone is paying you to do the work and the amount of time required for that action; so if no-one pays you, that ratio is zero. So, let’s imagine someone working for a web company, and one of the activities requires a database. Our intrepid administrator will start learning something about MySQL, will work diligently and install it (ok, nowadays it’s nearly point-and-click. Imagine it done a few years ago, with compiles and all that stuff).
This system administrator will never pay for MySQL enterprise, or whatever, because its pay is fixed, and there is no allocated budget for him to divert money to external entities. So, whatever is done by MySQL to monetize the enterprise version, there will be simply no way to obtain money from the people that is investing time, unless you sabotage the open source edition so that you are forced to pay for the enterprise one. But what will happen then? People will be forced to look at alternatives, because in any case time is the only resource available to them.
This basic concept is valid even when companies do have budget available. Consider the fact that the average percentage of revenues invested in ICT (information and communication technology) by companies is on average around 5%, with some sectors investing slightly less (4%) up to high-tech companies investing up to 7%. This percentage is nearly fixed, valid for small to large companies and across countries and sectors; this means that the commercial OSS company is competing for small slices of budget, and its capability to win is related mainly to the perceived advantages of going “enterprise” versus investing personnel time.
Does it means that trying to increase conversion rate is useless? Not exactly. The point is that you cannot address those users that have no budget available, as those will never be able to pay for your enhanced offering; you have two different possible channels: those that are using your product and may have the potential to pay, or address the group of non-users with the same demographics. So, the reality is that mining current users is potentially counterproductive, and it is more sensible to focus on two interlocking efforts:
- increase the number of adopters, and
- make sure that people knows about the commercial offering.
This can be performed “virally”, that is by creating an incentive for people to share the knowledge of your project with others, which is very fast, efficient and low-cost; however, this approach does have the disadvantage that sharing will happen within a single group of peers. In fact, viral sharing happens within only homologous group, and this means that it is less effective for reaching those users that are outside the same group – for example, the non-users that we are pointing at. This means that purely viral efforts are not capable of reaching your target – you need to complement it with more traditional marketing efforts.
Next: resource and development sharing, or how to choose your license depending on your expectations of external participation.