Fine-tuning (deep learning)

In deep learning, fine-tuning is an approach to transfer learning in which the parameters of a pre-trained model are trained on new data.^[1] Fine-tuning can be done on the entire neural network, or on only a subset of its layers, in which case the layers that are not being fine-tuned are "frozen" (not updated during the backpropagation step).^[2] A model may also be augmented with "adapters" that consist of far fewer parameters than the original model, and fine-tuned in a parameter–efficient way by tuning the weights of the adapters and leaving the rest of the model's weights frozen.^[3]

For some architectures, such as convolutional neural networks, it is common to keep the earlier layers (those closest to the input layer) frozen because they capture lower-level features, while later layers often discern high-level features that can be more related to the task that the model is trained on.^[2]^[4]

Models that are pre-trained on large and general corpora are usually fine-tuned by reusing the model's parameters as a starting point and adding a task-specific layer trained from scratch.^[5] Fine-tuning the full model is common as well and often yields better results, but it is more computationally expensive.^[6]

Fine-tuning is typically accomplished with

Sparrow.^[8]^[9]

Robustness

Fine-tuning can degrade a model's robustness to distribution shifts.^[10]^[11] One mitigation is to linearly interpolate a fine-tuned model's weights with the weights of the original model, which can greatly increase out-of-distribution performance while largely retaining the in-distribution performance of the fine-tuned model.^[12]

Variants

Low-rank adaptation

Low-rank adaptation (LoRA) is an adapter-based technique for efficiently fine-tuning models. The basic idea is to design a low-

rank matrix that is then added to the original matrix.^[13]

An "adapter" in this context is a collection of low-rank matrices, which when added to a base model, produces a fine-tuned model. It allows for performance that approaches full-model fine-tuning with less space requirement. A language model with billions of parameters may be LoRA fine-tuned with only several millions of parameters.

LoRA-based fine-tuning has become popular in the Stable Diffusion community.^[14] Support for LoRA was integrated into the Diffusers library from Hugging Face.^[15] Support for LoRA and similar techniques is also available for a wide range of other models through Hugging Face's Parameter-Efficient Fine-Tuning (PEFT) package.^[16]

Applications

Natural language processing

Fine-tuning is common in

GPT foundation models can be fine-tuned on data for specific downstream NLP tasks (tasks that use a pre-trained model) to improve performance over the unmodified pre-trained model.^[6]

Commercial models

Commercially-offered large language models can sometimes be fine-tuned if the provider offers a fine-tuning API. As of June 19, 2023, language model fine-tuning APIs are offered by OpenAI and Microsoft Azure's Azure OpenAI Service for a subset of their models, as well as by Google Cloud Platform for some of their PaLM models, and by others.^[17]^[18]^[19] Not all commercial models currently support fine-tuning.

Open-source models

Other companies such as Meta (Llama family), Alibaba (Qwen family) and Mixtral.AI (Mixtral) published open source models with different sizes on GitHub, which can be fine-tuned. This offers the advantage of true data security for companies, as they can control where the model is hosted.

References

ISBN 978-1-5443-6137-6. Archived from the original on January 10, 2023. Retrieved January 10, 2023.{{cite book}}: CS1 maint: location missing publisher (link
)

^ ^a ^b "CS231n Convolutional Neural Networks for Visual Recognition". cs231n.github.io. Retrieved 9 March 2023.

^ Liu, Haokun; Tam, Derek; Muqeeth, Mohammed; Mohta, Jay; Huang, Tenghao; Bansal, Mohit; Raffel, Colin A (2022). Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; Oh, A. (eds.). Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning (PDF). Advances in Neural Information Processing Systems. Vol. 35. Curran Associates, Inc. pp. 1950–1965.

arXiv:1311.2901
.

arXiv:2002.06305. {{cite journal}}: Cite journal requires |journal= (help
)

^
arXiv:2112.08718
.

arXiv:2010.07835
.

^ "Introducing ChatGPT". openai.com. Retrieved 9 March 2023.

arXiv:2209.14375. {{cite journal}}: Cite journal requires |journal= (help
)

arXiv:2103.00020 [cs.CV
].

arXiv:2202.10054
.

arXiv:2109.01903 [cs.CV
].

arXiv:2106.09685
.

^ Ryu, Simo (February 13, 2023). "Using Low-rank adaptation to quickly fine-tune diffusion models". GitHub. Retrieved June 19, 2023.

^ Cuenca, Pedro; Paul, Sayak (January 26, 2023). "Using LoRA for Efficient Stable Diffusion Fine-Tuning". Hugging Face. Retrieved June 19, 2023.

^ "Parameter-Efficient Fine-Tuning using 🤗 PEFT". huggingface.co. Retrieved 2023-06-20.

^ "Fine-tuning". OpenAI. Retrieved 2023-06-19.

^ "Learn how to customize a model for your application". Microsoft. Retrieved 2023-06-19.

^ "Tune text foundation models". Retrieved 2023-06-19.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Fine-tuning_(deep_learning)&oldid=1220720029"

[d2l-1] ISBN 978-1-5443-6137-6. Archived from the original on January 10, 2023. Retrieved January 10, 2023.{{cite book}}: CS1 maint: location missing publisher (link
)

[cs231n-2] "CS231n Convolutional Neural Networks for Visual Recognition". cs231n.github.io. Retrieved 9 March 2023.

[3] Liu, Haokun; Tam, Derek; Muqeeth, Mohammed; Mohta, Jay; Huang, Tenghao; Bansal, Mohit; Raffel, Colin A (2022). Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; Oh, A. (eds.). Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning (PDF). Advances in Neural Information Processing Systems. Vol. 35. Curran Associates, Inc. pp. 1950–1965.

[4] arXiv:1311.2901
.

[5] rXiv:2002.06305. {{cite journal}}: Cite journal requires |journal= (help
)

[amazon-6] 
arXiv:2112.08718
.

[7] rXiv:2010.07835
.

[8] "Introducing ChatGPT". openai.com. Retrieved 9 March 2023.

[9] rXiv:2209.14375. {{cite journal}}: Cite journal requires |journal= (help
)

[10] rXiv:2103.00020 [cs.CV
].

[11] rXiv:2202.10054
.

[12] rXiv:2109.01903 [cs.CV
].

[13] rXiv:2106.09685
.

[14] Ryu, Simo (February 13, 2023). "Using Low-rank adaptation to quickly fine-tune diffusion models". GitHub. Retrieved June 19, 2023.

[15] Cuenca, Pedro; Paul, Sayak (January 26, 2023). "Using LoRA for Efficient Stable Diffusion Fine-Tuning". Hugging Face. Retrieved June 19, 2023.

[16] "Parameter-Efficient Fine-Tuning using 🤗 PEFT". huggingface.co. Retrieved 2023-06-20.

[17] "Fine-tuning". OpenAI. Retrieved 2023-06-19.

[18] "Learn how to customize a model for your application". Microsoft. Retrieved 2023-06-19.

[19] "Tune text foundation models". Retrieved 2023-06-19.

[1]

[2]

[3]

[4]

[5]

[6]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]