みんな大好きなSIFTの特許を訳してみた

MPEGでは特定物体認識用の特徴量 (CDVS) の標準化を行なっていて、そのメーリングリストを久々に見たらSIFTの特許回避に関する話があった。
Compact Descriptors for Visual Search | MPEG
「SIFTはDoGを使ってるので、LoGベースなら大丈夫なんじゃない？LoGなんてめっちゃ昔から使われてるし！」→「SIFTの特許は（スケールスペースでの）極値を検出する処理が取られてて、そこに抵触する可能性があるのでは？」といったやり取りがあって、そういえばSIFTの特許を真面目に読んだこと無いなぁと思って訳してみた。確かに、極値を検出する処理が請求項1になっているよう。
請求項1〜4が検出器の話、請求項1、5〜9が記述子の話。請求項10以降は装置とコンピュータに言い換えただけの請求項なので省略。

請求項1
多数のピクセルによって定義される、画像中のスケール不変特徴を特定する方法であって、
前記画像から生成された多数の差分画像中のピクセル強度の極値の場所を（以下のように）特定し、
（問題としている（差分）画像中の各ピクセル強度と、前記ピクセルに関係するある範囲中にあるピクセル強度とを比較することで局所極大もしくは局所極小となるピクセルを特定し、
前記局所極大もしくは局所極小のピクセル強度と、問題としている画像の直前の画像のピクセル強度を比較することで極大もしくは極小となる可能性のあるピクセルを特定し、
前記極大もしくは極小となる可能性のあるピクセル強度と、問題としている画像の直後の画像のピクセル強度を比較することで実際の極大もしくは極小となるピクセルを特定する）
更に前記画像から生成された多数の差分画像中のピクセルの極値に対応するピクセル領域の各部分領域について多数の部分記述子を生成する方法。
請求項2
請求項1に記載の方法であって、前記差分画像を生成することを特徴とする方法。
請求項3
請求項2に記載の方法であって、差分画像を生成する際に、初期画像を平滑化することで平滑化画像を生成し、前記初期画像から前記平滑化画像を減算することによって差分画像を生成することを特徴とする方法。
請求項4
請求項3に記載の方法であって、差分画像を生成する際に請求項3に記載の通り連続的に平滑化と減算を行い、その際、連続平滑化処理で利用される初期画像は、直前の平滑化処理で生成された平滑化画像であることを特徴とする方法。
請求項5
請求項1に記載の方法であって、差分画像中の各ピクセルについて勾配ベクトルを生成することを特徴とする方法。
請求項6
請求項5に記載の方法であって、各差分画像における実際の極大もしくは極小となるピクセルそれぞれに角度ベクトルを紐付けることを特徴とする方法。
請求項7
請求項6に記載の方法であって、多数の部分領域記述子を生成する際に、前記各対応部分領域中のピクセルの勾配ベクトルに基づいて部分領域記述子を生成することを特徴とする方法。
請求項8
請求項7に記載の方法であって、前記部分領域記述子を生成する際に、前記部分領域において予め定められた角度範囲内の角度を持つピクセル数を測定することを特徴とする方法。
請求項9
請求項8に記載の方法であって、多数の部分領域記述子を生成する際に、多数の角度範囲と前記記述子とを関連付け、各部分領域毎に、各角度範囲の角度をを持つピクセル数を測定することを特徴とする方法。

原文↓
US6711293B1 - Method and apparatus for identifying scale invariant features in an image and use of same for locating an object in an image - Google Patents

A method of identifying scale invariant features in an image defined by a plurality of pixels, the method comprising:
locating pixel amplitude extrema in a plurality of difference images produced from said image by:
comparing the amplitude of each pixel in an image under consideration with the amplitudes of pixels in an area about said each pixel in said image under consideration to identify local maximal and minimal amplitude pixels;
comparing the amplitudes of said local maximal and minimal amplitude pixels with the amplitudes of pixels in a predecessor image to the image under consideration to identify possible maximal and minimal amplitude pixels and
comparing the amplitudes of said possible maximal and minimal amplitude pixels with the amplitudes of pixels in a successor image to the image under consideration to identify actual maximal and minimal amplitude pixels; and
producing a plurality of component subregion descriptors for each subregion of a pixel region about said pixel amplitude extrema in said plurality of difference images produced from said image.

The method claimed in claim 1 further comprising producing said difference images.

The method claimed in claim 2 wherein producing a difference image comprises blurring an initial image to produce a blurred image and subtracting said blurred image from said initial image to produce a difference image.

The method claimed in claim 3 wherein producing said difference images comprises successively blurring and subtracting as recited in claim 3 where said initial image used in a successive blurring function includes a blurred image produced in a predecessor blurring function.

The method claimed in claim 1 further comprising producing a pixel gradient vector for each pixel in each difference image.

The method claimed in claim 5 further comprising associating vector orientations with respective actual maximal and minimal amplitude pixels associated with each difference image.

The method claimed in claim 6 wherein producing a plurality of component subregion descriptors comprises producing subregion descriptors for each respective subregion in response to pixel gradient vectors of pixels within said each respective subregion.

The method claimed in claim 7 wherein producing each of said subregion descriptors comprises determining the number of pixel vectors at orientations within a predefined range of orientations in said subregion.

The method claimed in claim 7 wherein producing a plurality of subregion descriptors comprises associating with each of said descriptors a plurality of orientation ranges and determining the number of pixel vectors at orientations within respective orientation ranges, for each subregion.