AV1: Difference between revisions

Source: Wikipedia, the free encyclopedia.
Content deleted Content added
Extended confirmed users
3,771 edits
→‎Hardware: +intel arc
(edit summary removed)
Line 790: Line 790:
*On 4 January 2022, Intel officially launched [[Alder Lake (microprocessor)|Alder Lake]] 12th Gen mobile CPUs and non-K series desktop CPUs with AV1 fixed-function hardware decoding.<ref>{{Cite web|url=https://www.intel.com/content/www/us/en/newsroom/news/ces-2022-mobile-12th-gen-core.html|title = CES: Intel Engineers Fastest Mobile Processor Ever with 12th Gen}}</ref>
*On 4 January 2022, Intel officially launched [[Alder Lake (microprocessor)|Alder Lake]] 12th Gen mobile CPUs and non-K series desktop CPUs with AV1 fixed-function hardware decoding.<ref>{{Cite web|url=https://www.intel.com/content/www/us/en/newsroom/news/ces-2022-mobile-12th-gen-core.html|title = CES: Intel Engineers Fastest Mobile Processor Ever with 12th Gen}}</ref>
*On February 17, 2022, Intel officially announced that Arctic Sound-M has the industry's first hardware-based AV1 encoder inside a GPU.<ref>{{Cite web|url=https://www.intel.com/content/www/us/en/newsroom/news/intel-technology-roadmaps-milestones.html|title = Intel Technology Roadmaps and Milestones}}</ref>
*On February 17, 2022, Intel officially announced that Arctic Sound-M has the industry's first hardware-based AV1 encoder inside a GPU.<ref>{{Cite web|url=https://www.intel.com/content/www/us/en/newsroom/news/intel-technology-roadmaps-milestones.html|title = Intel Technology Roadmaps and Milestones}}</ref>
*On March 30, 2022, Intel officialy announced the Intel Arc Alchemist family with AV1 fixed-function hardware decoding and fixed-function hardware encoding.<ref>https://www.intel.com/content/www/us/en/newsroom/opinion/intel-discrete-mobile-graphics-family-arrives.html</ref><ref>https://www.intel.com/content/www/us/en/products/docs/arc-discrete-graphics/creator.html</ref><ref>https://twitter.com/IntelGraphics/status/1509186521953415179</ref>


== Patent claims==
== Patent claims==

Revision as of 16:09, 30 March 2022

AOMedia Video 1
Internet media typevideo/AV1, video/webm; codecs="av01.*"
Developed byAlliance for Open Media
Initial release28 March 2018; 6 years ago (2018-03-28)
Latest release
1.0.0 Errata 1[1]
8 January 2019; 5 years ago (2019-01-08)
Type of formatVideo coding format
Contained by
Extended from
Extended toAVIF
StandardAOM AV1
Open format?Yes
Free format?See § Patent claims
Websiteaomedia.org/av1-features/ Edit this at Wikidata

AOMedia Video 1 (AV1) is an

open, royalty-free video coding format initially designed for video transmissions over the Internet. It was developed as a successor to VP9 by the Alliance for Open Media (AOMedia),[2] a consortium founded in 2015 that includes semiconductor firms, video on demand providers, video content producers, software development companies and web browser vendors. The AV1 bitstream specification includes a reference video codec.[1] In 2018, Facebook conducted testing that approximated real world conditions, and the AV1 reference encoder achieved 34%, 46.2% and 50.3% higher data compression than libvpx-vp9, x264
High profile, and x264 Main profile respectively.

Like VP9, but unlike H.264/AVC and HEVC, AV1 has a royalty-free licensing model that does not hinder adoption in open-source projects.[3][4][5][6][2][7]

AVIF is an image file format that uses AV1 compression algorithms.

History

The Alliance's motivations for creating AV1 included the high cost and uncertainty involved with the patent licensing of

MPEG-LA, when the HEVC standard was finished, two patent pools had been formed with a third pool on the horizon. In addition, various patent holders were refusing to license patents via either pool, increasing uncertainty about HEVC's licensing. According to Microsoft's Ian LeGrow, an open-source, royalty-free technology was seen as the easiest way to eliminate this uncertainty around licensing.[8]

The negative effect of patent licensing on

Many of the components of the AV1 project were sourced from previous research efforts by Alliance members. Individual contributors had started experimental technology platforms years before: Xiph's/Mozilla's Daala published code in 2010, Google's experimental VP9 evolution project VP10 was announced on 12 September 2014,[12] and Cisco's Thor was published on 11 August 2015. Building on the code base of VP9, AV1 incorporates additional techniques, several of which were developed in these experimental formats.[13]

The first version 0.1.0 of the AV1 reference codec was published on 7 April 2016. Although a soft

feature freeze came into effect at the end of October 2017, development continued on several significant features. One of these, the bitstream format, was projected to be frozen in January 2018 but was delayed due to unresolved critical bugs as well as further changes to transformations, syntax, the prediction of motion vectors, and the completion of legal analysis.[citation needed] The Alliance announced the release of the AV1 bitstream specification on 28 March 2018, along with a reference, software-based encoder and decoder.[14] On 25 June 2018, a validated version 1.0.0 of the specification was released.[15]
On 8 January 2019 a validated version 1.0.0 with Errata 1 of the specification was released.

Martin Smole from AOM member Bitmovin said that the computational efficiency of the reference encoder was the greatest remaining challenge after the bitstream format freeze had been completed.[16] While still working on the format, the encoder was not targeted for production use and speed optimizations were not prioritized. Consequently, the early version of AV1 was orders of magnitude slower than existing HEVC encoders. Much of the development effort was consequently shifted towards maturing the reference encoder. In March 2019, it was reported that the speed of the reference encoder had improved greatly and within the same order of magnitude as encoders for other common formats.[17]

On 21 January 2021, the MIME type of AV1 was defined as video/AV1. The usage of AV1 using this MIME type is restricted to Real-time Transport Protocol purposes only.[18]

In April 2021, Roku removed the Youtube TV app from the Roku streaming platform after a contract expired. It was later reported that Roku streaming devices do not use processors that support the AV1 codec. In December 2021, Youtube and Roku agreed to a multiyear deal to keep both the Youtube TV app and the Youtube app on the Roku streaming platform. Roku had argued that using processors in their streaming devices that support the royalty-free AV1 codec would increase costs to consumers.[19][20]

Purpose

AV1 aims to be a video format for the web that is both

royalty free.[2] According to Matt Frost, head of strategy and partnerships in Google's Chrome Media team, "The mission of the Alliance for Open Media remains the same as the WebM project."[21]

A recurring concern in standards development, not least of royalty-free multimedia formats, is the danger of accidentally infringing on patents that their creators and users did not know about. This concern has been raised regarding AV1,[22] and previously VP8,[23] VP9,[24] Theora[25] and IVC.[26] The problem is not unique to royalty-free formats, but it uniquely threatens their status as royalty-free.

Patent licensing AV1, VP9, Theora HEVC, AVC GIF, MP3, MPEG-1, MPEG-2, MPEG-4 Part 2
By known patent holders Royalty-free Royalty bearing Patents expired
By unknown patent holders Impossible to ascertain until the format is old
enough that any patents would have expired
(at least 20 years in WTO countries)

To fulfill the goal of being royalty free, the development process requires that no feature can be adopted before it has been confirmed independently by two separate parties to not infringe on patents of competing companies. In cases where an alternative to a patent-protected technique is not available, owners of relevant patents have been invited to join the Alliance (even if they were already members of another patent pool). For example, Alliance members Apple, Cisco, Google, and Microsoft are also licensors in MPEG-LA's patent pool for H.264.[22] As an additional protection for the royalty-free status of AV1, the Alliance has a legal defense fund to aid smaller Alliance members or AV1 licensees in the event they are sued for alleged patent infringement.[22][5][27]

Under patent rules adopted from the World Wide Web Consortium (W3C), technology contributors license their AV1-connected patents to anyone, anywhere, anytime based on reciprocity (i.e. as long as the user does not engage in patent litigation).[28] As a defensive condition, anyone engaging in patent litigation loses the right to the patents of all patent holders.[citation needed][29]

This treatment of

intellectual property rights (IPR), and its absolute priority during development, is contrary to extant MPEG formats like AVC and HEVC. These were developed under an IPR uninvolvement policy by their standardization organisations, as stipulated in the ITU-T's definition of an open standard. However, MPEG's chairman has argued this practice has to change,[30] which it is:[citation needed] EVC is also set to have a royalty-free subset,[31][32] and will have switchable features in its bitstream to defend against future IPR threats.[citation needed
]

The creation of royalty-free web standards has been a long-stated pursuit for the industry. In 2007, the proposal for

HTML5 video specified Theora as mandatory to implement. The reason was that public content should be encoded in freely implementable formats, if only as a "baseline format", and that changing such a baseline format later would be hard because of network effects.[33]

The Alliance for Open Media is a continuation of Google's efforts with the WebM project, which renewed the royalty-free competition after Theora had been surpassed by AVC. For companies such as Mozilla that distribute free software, AVC can be difficult to support as a per-copy royalty is unsustainable given the lack of revenue stream to support these payments in free software (see

HEVC § Provision for costless software
).

The performance goals include "a step up from VP9 and HEVC" in efficiency for a low increase in

videoconferencing equipment, and their Thor contributions aim at "reasonable compression at only moderate complexity".[35]

Feature-wise, AV1 is specifically designed for

real-time applications (especially WebRTC) and higher resolutions (wider color gamuts, higher frame rates, UHD) than typical usage scenarios of the current generation (H.264) of video formats, where it is expected to achieve its biggest efficiency gains. It is therefore planned to support the color space from ITU-R Recommendation BT.2020 and up to 12 bits of precision per color component.[36] AV1 is primarily intended for lossy encoding, although lossless compression is supported as well.[37]

Technology

AV1 is a traditional block-based

frequency transform format featuring new techniques. Based on Google's VP9,[38]
AV1 incorporates additional techniques that mainly give encoders more coding options to enable better adaptation to different types of input.

Processing stages of an AV1 encoder with relevant technologies associated with each stage.
libaom
Developer(s)Alliance for Open Media
Stable release
3.3.0[39] / 15 February 2022; 2 years ago (2022-02-15)
Written in
BSD 2-Clause License (free software)
Websiteaomedia.googlesource.com/aom

The Alliance published a

BSD 2-Clause License.[40]
Development happens in public and is open for contributions, regardless of AOM membership.

The development process was such that coding tools were added to the reference code base as experiments, controlled by flags that enable or disable them at build time, for review by other group members as well as specialized teams that helped with and ensured hardware friendliness and compliance with intellectual property rights (TAPAS). When the feature gained some support in the community, the experiment was enabled by default, and ultimately had its flag removed when all of the reviews were passed.[41] Experiment names were lowercased in the configure script and uppercased in conditional compilation flags.[citation needed]

To better and more reliably support HDR and color spaces, corresponding metadata can now be integrated into the video bitstream instead of being signaled in the container.

Partitioning

10 ways for subpartitioning coding units – into squares (recursively), rectangles, or mixtures thereof ("T-shaped").

Frame content is separated into adjacent same-sized blocks referred to as superblocks. Similar to the concept of a macroblock, superblocks are square-shaped and can either be of size 128×128 or 64×64 pixels. Superblocks can be divided in smaller blocks according to different partitioning patterns. The four-way split pattern is the only pattern whose partitions can be recursively subdivided. This allows superblocks to be divided into partitions as small as 4×4 pixels.

Diagram of the AV1 superblock partitioning. It shows how 128×128 superblocks can be split all the way down to 4×4 blocks. As special cases, 128×128 and 8×8 blocks can't use 1:4 and 4:1 splits, and 8×8 blocks can't use "T"-shaped splits.

"T-shaped" partitioning patterns are introduced, a feature developed for VP10, as well as horizontal or vertical splits into four stripes of 4:1 and 1:4 aspect ratio. The available partitioning patterns vary according to the block size, both 128×128 and 8×8 blocks can't use 4:1 and 1:4 splits. Moreover, 8×8 blocks can't use "T" shaped splits.

Two separate predictions can now be used on spatially different parts of a block using a smooth, oblique transition line (wedge-partitioned prediction).[citation needed] This enables more accurate separation of objects without the traditional staircase lines along the boundaries of square blocks.

More encoder parallelism is possible thanks to configurable prediction dependency between tile rows (ext_tile).[42]

Prediction

AV1 performs internal processing in higher precision (10 or 12 bits per sample), which leads to quality improvement by reducing rounding errors.

Predictions can be combined in more advanced ways (than a uniform average) in a block (compound prediction), including smooth and sharp transition gradients in different directions (wedge-partitioned prediction) as well as implicit masks that are based on the difference between the two predictors. This allows the combination of either two inter predictions or an inter and an intra prediction to be used in the same block.[43][citation needed]

A frame can reference 6 instead of 3 of the 8 available frame buffers for temporal (inter) prediction while providing more flexibility on bi-prediction[44] (ext_refs[citation needed]).

Warped motion as seen from the front of a train.

The Warped Motion (warped_motion)

motion vectors by recognizing patterns arising from camera motion.[42]
They implement ideas that were attempted in preceding formats like e.g. MPEG-4 ASP, albeit with a novel approach that works in three dimensions. There can be a set of warping parameters for a whole frame offered in the bitstream, or blocks can use a set of implicit local parameters that get computed based on surrounding blocks.

Switch frames (S-frame) are a new inter-frame type that can be predicted using already-decoded reference frames from a higher-resolution version of the same video to allow switching to a lower resolution without the need for a full keyframe at the beginning of a video segment in the adaptive bitrate streaming use case.[45]

Intra prediction

Intra prediction consists of predicting the pixels of given blocks only using information available in the current frame. Most often, intra predictions are built from the neighboring pixels above and to the left of the predicted block. The DC predictor builds a prediction by averaging the pixels above and to the left of block.

Directional predictors extrapolate these neighboring pixels according to a specified angle. In AV1, 8 main directional modes can be chosen. These modes start at an angle of 45 degrees and increase by a step size of 22.5 degrees up until 203 degrees. Furthermore, for each directional mode, six offsets of 3 degrees can be signaled for bigger blocks, three above the main angle and three below it, resulting in a total of 56 angles (ext_intra).

The "TrueMotion" predictor was replaced with a

Paeth predictor which looks at the difference from the known pixel in the above-left corner to the pixel directly above and directly left of the new one and then chooses the one that lies in direction of the smaller gradient as predictor. A palette predictor is available for blocks with up to 8 dominant colors, such as some computer screen content. Correlations between the luminosity and the color information can now be exploited with a predictor for chroma blocks that is based on samples from the luma plane (cfl).[42] In order to reduce visible boundaries along borders of inter-predicted blocks, a technique called overlapped block motion compensation (OBMC) can be used. This involves extending a block's size so that it overlaps with neighboring blocks by 2 to 32 pixels, and blending the overlapping parts together.[46]

Data transformation

To transform the error remaining after prediction to the frequency domain, AV1 encoders can use square, 2:1/1:2, and 4:1/1:4 rectangular DCTs (rect_tx),[44] as well as an asymmetric DST[47][48][49] for blocks where the top and/or left edge is expected to have lower error thanks to prediction from nearby pixels, or choose to do no transform (identity transform).

It can combine two one-dimensional transforms in order to use different transforms for the horizontal and the vertical dimension (ext_tx).[42][44]

Quantization

AV1 has new optimized quantization matrices (aom_qm).[50] The eight sets of quantization parameters that can be selected and signaled for each frame now have individual parameters for the two chroma planes and can use spatial prediction. On every new superblock, the quantization parameters can be adjusted by signaling an offset.

Filters

In-loop filtering combines Thor's constrained low-pass filter and Daala's directional deringing filter into the Constrained Directional Enhancement Filter, cdef. This is an edge-directed conditional replacement filter that smooths blocks roughly along the direction of the dominant edge to eliminate ringing artifacts.[citation needed]

There is also the loop restoration filter (loop_restoration) based on the Wiener filter and self-guided restoration filters to remove blur artifacts due to block processing.[42]

Film grain synthesis (film_grain) improves coding of noisy signals using a parametric video coding approach. Due to the randomness inherent to film grain noise, this signal component is traditionally either very expensive to code or prone to get damaged or lost, possibly leaving serious coding artifacts as residue. This tool circumvents these problems using analysis and synthesis, replacing parts of the signal with a visually similar synthetic texture based solely on subjective visual impression instead of objective similarity. It removes the grain component from the signal, analyzes its non-random characteristics, and instead transmits only descriptive parameters to the decoder, which adds back a synthetic, pseudorandom noise signal that's shaped after the original component. It is the visual equivalent of the Perceptual Noise Substitution technique used in AC3, AAC, Vorbis, and Opus audio codecs.

Entropy coding

Daala's entropy coder (daala_ec[

Huffman code
(but not as simple and fast as Huffman code). AV1 also gained the ability to adapt the symbol probabilities in the arithmetic coder per coded symbol instead of per frame (ec_adapt).[42]

Scalable video coding