LAION
Non-profit | |
Industry | Artificial intelligence |
---|---|
Founder |
|
Website | laion |
LAION (acronym for Large-scale Artificial Intelligence Open Network) is a German non-profit which makes open-sourced
In February 2023, LAION was named in the Getty Images lawsuit against Stable Diffusion as a non-party.[4] In April 2023, LAION was directly sued by a German photographer who wanted to have his images removed from the training set.[5]
On April 15, 2023, LAION and contributors released to public an open source AI assistant chatbot
Image datasets
LAION has publicly released a number of large datasets of image-caption pairs which have been widely used by AI researchers. The data is derived from the Common Crawl, a dataset of scraped web pages. The developers searched the crawled html for <img>
tags and treated their alt attributes as captions. They used CLIP to identify and discard images whose content did not appear to match their captions.[6] LAION does not host the content of scraped images themselves; rather, the dataset contains URLs pointing to images, which researchers must download themselves.[7]
The first such dataset, LAION-400M, was released in August 2021 and consisted of 400 million image-caption pairs. The pairs were extracted from a random subset of webpages scraped by Common Crawl between 2014 and 2021.
A successor of more than 5 billion pairs, LAION-5B, was released in March 2022.[10] As of its release, it was the largest freely available dataset of image-caption pairs in existence.[6] Its creation was funded by Doodlebot, Hugging Face and Stability AI, the AI company behind the funding of the Stable Diffusion text-to-image model, which was trained on it.[11]
Criticism
Several studies show that the images in LAION-5B contain problematic images and text pairs of rape, pornography, malign stereotypes, racist and ethnic slurs, and other extremely problematic content.[12][13]
An investigation by Bayerischer Rundfunk showed that LAION's datasets, hosted on Hugging Face, contain large amounts of private and sensitive data.[14]
In December 2023, the
OpenAssistant
Developer(s) | LAION and contributors |
---|---|
Initial release | 15 April 2023 |
Type |
|
Apache License 2.0 | |
Website | open-assistant |
OpenAssistant is an artificial intelligence (AI) open source chat-based assistant that understands tasks, can interact with third-party systems and retrieve information dynamically to do so. The project is developed by a group of volunteers in collaboration with LAION. One of the goals for development includes free access to large language models that can be run locally on consumer hardware.[16][17] The project is backed by a worldwide crowdsourcing effort involving over 13,500 volunteers who have created 600k human-generated data points.[17][18]
References
- ^ "About". LAION.ai. Retrieved 26 September 2022.
- ^ Edwards, Benj (15 September 2022). "Have AI image generators assimilated your art? New tool lets you check". Ars Technica.
- ^ Newman, Marissa; Cantrill, Aggi (24 April 2023). "The Future of AI Relies on a High School Teacher's Free Database". Bloomberg News. Retrieved 24 April 2023.
- ^ "Getty Images (US), Inc. v. Stability AI, Inc., 1:23-cv-00135". CourtListener. Retrieved 2023-02-08.
- ^ "A Photographer Tried to Get His Photos Removed from an AI Dataset. He Got an Invoice Instead". Vice. Retrieved 2023-05-04.
- ^ a b c Alford, Anthony (17 May 2022). "LAION Releases Five Billion Image-Text Pair Dataset LAION-5B". InfoQ.
- ^ Edwards, Benj (21 September 2022). "Artist finds private medical record photos in popular AI training data set". Ars Technica.
- ^ Schuhmann, Christoph (8 August 2021). "LAION-400-Million Open Dataset". LAION blog. Retrieved 26 September 2022.
- arXiv:2205.11487 [cs.CV].
- ^ Beaumont, Romain (3 March 2022). "LAION-5B: A New Era of Open Large-Scale Multi-Modal Datasets". LAION blog.
- ^ Wiggers, Kyle (12 August 2022). "This startup is setting a DALL-E 2-like AI free, consequences be damned". TechCrunch.
- )
- arXiv:2311.03449
- ^ Brunner, Katharina; Harlan, Elisa. "We Are All Raw Material for AI". Bayerischer Rundfunk.
- ^ Cole, Samantha (20 December 2023). "Largest Dataset Powering AI Images Removed After Discovery of Child Sexual Abuse Material". 404 Media. Retrieved 22 December 2023.
- ^ Open-Assistant, LAION AI, 2023-03-09, retrieved 2023-03-09
- ^ arXiv:2304.07327 [cs.CL].
- ^ "Open Assistant: Explore the Possibilities of Open and Collaborative Chatbot Development". KDnuggets. Retrieved 2023-05-05.