The technology behind generative AI has created a big splash in the content-addicted world of the internet. Its ability to churn out hundreds of synthetic images and text at industrial speeds and scales has fascinated people who have heralded the end of this or that job, particularly in the creative fields like writing, acting, and creating art. We are being told that the machines are a perfect replacement for human creativity with a fraction of the time and expense required to create ‘human art’. Art is now ‘content’, and must be produced and assembled at a rapid pace to feed the ever-hungry eyeballs glued to their screens all over the world.
You can’t ‘steal’ to read, but they can steal for profit
The tech barons of our time approach digital commons with a libertarian capitalist mindset, seeking out unregulated terrain which can be exploited for profit without regulatory oversight or democratic scrutiny. Companies like MidJourney and OpenAI are facing flak and legal challenges for using creative works to train their LLMs (large language models), without consent or compensation to the original artists. Artists and others are speaking up against what they perceive as a blatant theft of their work. In a recent interview, the CTO of OpenAI’s video-creation tool Sora was asked “What data was used to train Sora”, to which she answered that it was “publicly available and licensed data” but when asked to specify if it was videos from YouTube (and other social media), she said she is “not sure” about it. Reacting to this, tech critic Paris Marx tweeted: “Does the CTO not know what data was used to train Sora, or does she know that if she admitted it there would be an uproar?”, also saying that he supports a “total ban on generative AI”. Here is a podcast episode of Marx with Ed Zitron with a healthy dose of scepticism about AI as the ‘next big thing’.
There are many such lawsuits against companies offering AI products based on allegations of intellectual property theft and copyright violation. It is interesting to see how copyright protection seems to work only one way, when it is about publishing or entertainment corporations protecting their ‘property’ (like books, films, videogames, etc.) from online piracy, but not so much when such ‘theft’ is done by tech companies backed by billions of dollars. Aaron Swartz was a programmer and internet freedom advocate who co-founded Reddit was a voice for the ‘open access movement’ that demanded free access to information. In his 2008 ‘Guerilla Open Access Manifesto’, he wrote: “The world’s entire scientific and cultural heritage, published over centuries in books and journals, is increasingly being digitized and locked up by a handful of
private corporations.”, and “With enough of us, around the world, we’ll not just send a strong message opposing the privatization of knowledge — we’ll make it a thing of the past.” In 2011, the U.S. clamped down heavily on him for downloading “several million articles” from the database of academic subscription service JSTOR. He faced 35 years in prison and a $1 million fine. He committed suicide in 2013 at the age of 26.
Similarly, Elseiver won its case against SciHub in 2017 and the cofounders of the torrent site The Pirate Bay went to prison in 2014. Telegram faced five copyright cases in India in 2023, all for circulating reading and education material. But now that there is opposition from media houses, authors, and artists, it is being spun by some as an unfounded opposition to the emergence of a new technology. In the lawsuit against Nvidia and Databricks (tweet above), it is alleged that the defendants used Bibliotik, a “shadow library” disseminating copyrighted books illegally, to train their LLMs. It notes that “These shadow libraries have long been of interest to the AI-training community because they host and distribute vast quantities of unlicensed copyrighted material. For that reason, these shadow libraries also violate the U.S. Copyright Act.”
This article by Gary Marcus and Reid Southen explores the “copyright minefield” thrown up by experiments with visual generative AI tools Midjourney and DALL-E 3 that show that LLMs regurgitate the material they are trained on with very minor changes.
The implications of copyright infringement vary vastly, depending on the balance of power between those who are locked out of knowledge due to predatory pricing and paywalls vs. those who seek to resist the devaluation of their work and identity as artists.
Due to these issues, there are growing calls for AI companies to be transparent about the training datasets used by their LLMs, and to work out fair mechanisms to compensate and seek permission from the creators of the works they use. It is easy to see why they might be unwilling to do this because the costs, time, and other complications involved would cut into their bottom line and lose them ‘efficiency’ and profits. Will the legal repercussions for these companies be the same as for those who undertook direct action to loosen the stranglehold of the privatization of collective knowledge and culture? Is that even possible within the broader system of capitalism in which we all live?
Artists resist theft of their work, devaluation of art
The motto of pushing generative AI images as ‘art’, and ‘prompt writers’ as ‘artists’ seems to be “convenience over creativity”. The companies that profit from this try to present a narrative of the ‘democratization’ of art (just like buying gadgets and devices is considered ‘democratization’ of tech), but is this not a devaluation of the creative process? It aims to short-circuit the human element that goes into art: imagining a finished piece and putting in the technical and personal work to reach a point where one can execute it. Generative AI creates the misconception that art is an ‘on-demand’ product. The synthetic images or text created by these models lack intentionality and personality, which I believe are very important aspects of any work of art.
The ‘democratization’ of art and creativity does not mean hollowing out the essential meaning of creating art. Devoid of its social and individual roots, the ‘art’ that is served up to us by generative AI technologies is just a synthetic collection of pixels, mashed together from a large corpus of inputs. It is just copying and re-assembly of previously ingested material. Recent updates to Midjourney take away even the burden of entering text-based prompts. Now one can simply have the tool ‘describe’ the images one feeds into it, giving users a prompt from which to generate still more images. Has the process of artistic creation been boiled down to making copies of copies?
Several artists and writers have strongly objected to being rendered as simply a ‘tag’ for their unique art styles which they have spent years of their life practising and perfecting. All that a (paying) user of Midjourney must do is to include an artist’s name in their prompt with “in the style of…” and there will be an artificial replication of their style which can be infinitely varied. This is not only discouraging at a personal level but devalues the achievement of artistic proficiency and skill.
The artist is forced to hand over the fruits of their labour without even being asked. The actions of these generative AI companies is generating a discourse which deems it okay for artistic work to be appropriated through participating in a so-called ‘revolutionary’ technology. It exacerbates the economic precarity and the negative social attitudes towards the pursuit of the arts, by not acknowledging artists’ role in society and culture and rendering them as hapless ‘suppliers’ of raw material for the corporate machine to churn out profit. To add to this, Midjourney is also apparently blocking people who are calling out these practices on Twitter. The least that the artistic community deserves is to be able to voice their dissatisfaction and criticisms directly to these companies. Do watch the documentary ‘AI vs Artists- The Biggest Heist in History’ with artists and scientists talk about the different aspects of generative AI, including what artists are doing to protect their work against forceful appropriation.
Arjun Banerjee is a writer and political commentator. He is a postgraduate in English literature from the University of Delhi. He writes about current events and culture.