YouTube

From RB Wiki
Revision as of 09:36, 2 February 2020 by Lê Nguyên Hoang (talk | contribs) (→‎Effects)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

YouTube is a video-sharing platform. It is argued by HoangElmhamdi19FR to be one of today's biggest opinion-makers, and the YouTube algorithm is argued to be one of today's most sophisticated algorithms. This arguably makes YouTube one of the most crucial applications and test cases for AI ethics.

Key numbers

As a website, according to Alexa, it is second only to Google.com. Yet according to Merge, there are actually more views on YouTube than searches on Google (and it has been so since 2016). Every minute, in 2019, there were 4.5M YouTube views, as opposed to 3.8M Google searches.

Apart from direct messaging (~250M emails + Messenger + WhatsApp + Texts), this is what's biggest, in contention with Giphy (4.8M). Behind, we find Snapchat (2.4M snaps), Tinder (1.4M swipes), Twitch (1M views) and Facebook (1M logging in). Netflix, Instagram and Twitter follow, with <1M views, scrolls and tweets.

In 2017, YouTube exceeded 1 billion hours of watch-time per day. It self-reported having 2 billion users, with 70% of views on phones YouTubePress.

Most crucially, YouTube's chief product officer revealed that 70% of views on YouTube result from recommendations of the YouTube algorithm cnet18. This algorithm is crtical to YouTube (this is Lê's impression after numerous private conversations with different YouTube employees).

It is hard to grasp the extent of the influence of the YouTube recommender algorithm if one is not familiar with cognitive biases. In particular, we humans undergo the familiarity bias, which makes us prefer and consider more true things and ideas that are familiar to us Veritasium16. The impact of this bias is huge, especially for repeated exposure. In fact, Google research suggests that the impacts of repeated exposure are still increasing months after the first exposure HOT15. This makes familiarity bias hard to study.

Boosted by the recommender algorithm, YouTube is remarkably successful at making users' sessions last an average of 60 minutes theStreet18. This ubiquity of YouTube means that business is occuring on YouTube, with 61% of US companies on the platform PRNewswire16.

YouTube's recommender algorithm builds upon and is intimately connected to numerous other spectacular services provided by other YouTube algorithms. After all, there are more than 500 hours of new video uploaded per minute theStreet18, which have to be analyzed for violence, hate speech wired18 or paedophilia wiredUK19. Note that this is in YouTube's incentives, as many advertisers have left YouTube after complaining about being associated with controversial videos.

The volume of videos to be processed is huge. But the quality of the processing is also crucial. Advertisers and content creators both put strong pressure on YouTube to moderate videos adequately. But this is a hard task. YouTube is probably deploying the most sophisticated state-of-the-art deep-learning-based algorithms to do so, especially image and sound processing, speech recognition, natural language processing (over 1 billion videos are automatically captioned YouTubeBlog17), feature learning, user representation, and so on.

Also it's actually a hard task to delegate to humans, because humans are biased too. Moreover, the video moderation task is traumatic, with numerous cases of PTSD after being exposed to horrible videos, like murder videos or child abuse TheVerge19 BBC18.

LexFridman20 interviews Christos Goodrow, VP of Engineering at Google and head of Search and Discovery at YouTube.

Algorithms

YouTube is unfortunately very private about its algorithms and data. But there are a few publications.

The recommender algorithm definitely relies on deep neural networks CAS16 to perform video and user embedding. This allows it to perform numerous tasks ZHWCN+19.

Recently, IJWNA+19 proposed a reinforcement learning algorithm for video recommendation, which they actually run on YouTube with notable impacts, including in the long run. This strongly suggests that YouTube are, or will soon be, deploying reinforcement learning algorithms with planning capabilities (probably at the scale of weeks! HOT15).

But what do these algorithms maximize? Unfortunately, little is shared publicly by YouTube. YouTube managers, engineers and researchers all told Lê in private conversation that they maximized "user engagement", which was very distinct from good old watch-time. It seems likely that this measure includes things like the "appropriateness" of the video, as YouTube announced its will to combat radicalization and conspiracy theories TheGuardian19. It seems that YouTube, Twitter and Facebook have all been investing much more ethical concerns SmarterEveryDay19a SmarterEveryDay19b SmarterEveryDay19c, probably in large part because it has become in their interests.

However, there is doubt that YouTube is doing enough. What is very clear is that their opaqueness hinders any effort to better judge the impacts of supposed changes of YouTube algorithms, as well as to contribute to making these changes robustly beneficial. YouTube CEO Susan Wojcicki claimed in 60minutes19 that, because of changes, in 2019, American users consume 70% less contents on controversial topics (!). But to the best of Lê's knowledge, this figure has not been confirmed by any independent researcher.

Lê thinks that YouTube cannot possibly be doing enough; simply because the challenges are so huge that the help of thousands of top researchers is probably necessary to "do enough". In fact, arguably, the ethics of such systems won't be really trustable as long as no social choice mechanism to determine, say, the hate speech limit, is developed.

Effects

Building upon Allgaier16 where he used Tor to better understand how YouTube recommends videos, one after the other, Allgaier19 showed evidence that, while YouTube presents correct recommendations when users search "climate change", it suggests climate denial videos for searches "geoengineering" or "climate manipulation". Disturbingly, this results in roughly as many views defending the climate change consensus as denying them, among the top 200 videos on such topics (16.94M for each side). Similar techniques are proposed by Algotransparency.

YouTube use has also be connected to radicalization. By analyzing YouTube comments, ROWAM20 showed that a strong signal of users moving from "Alt-lite" to the "Intellectual Dark Web", and then from the "Intellectual Dark Web" to the "Alt-right". This is evidence of a radicalization pipeline suggested by, say, Tufekci18 or PutraIbrahim17. MKNS17 suggest that targeted massive persuasion was very effective. See online polarization for more on this debated topic.

Concerns have also been raised, typically in HoangElmhamdi19FR, about biases (the YouTube channel Science4All has only 7% of female views; similar channels have similar statistics, including when hosted by a female!), anger virality BergerMilkman12 GRMO13 CGPGrey15 S4A17FR and mental health LSSRM+16. Arguably, misinformation about vaccines promoted by YouTube already caused (hundreds of) thousands of deaths SongGruzd18 DPFAC+18 WHO19.

Unfortunately, such effects are mostly hard to study, partly because YouTube's data is private, but also because the time-scale of such effects is of the order of weeks, months, and even probably years HOT15. But a study by Facebook KGH14 suggests that repeated exposure to some positive or negative emotion effectively changes users' emotions within merely a week. Ethical concerns have prevented studies on a longer term, but it seems likely that even stronger effects could be observed RB3.

In any case, even if YouTube were not that detrimental today, it is worth pointing out that, by being beneficial, YouTube could be an enormous amount of good. Making YouTube beneficial is not just about fixing problems; it's about making the world a much better place. YouTube could promote social justice causes more effectively, raise awareness of climate change, encourage environmental-friendly habits, discourage polluting consumption, teach mathematics and critical thinking, diagnose and accompany mental depression and loneliness ESMUC+18, provide quality medical advice and promote kindness over hatred.

However, there are numerous potential pitfalls (most of which Lê has surely overlooked). Better understanding the YouTube ecosystem and what is going (or can go) wrong is definitely an important research challenge. In particular, we should not aim to make YouTube mostly or roughly beneficial. We should try to make it robustly beneficial. Robust to the diversity of users' moral preferences (see social choice), to the inevitable distributional shift of any change of the algorithm, to adversarial attacks by malicious users and to unanticipated side effect. This is not easy. It needs to be researched.

Can/Should YouTube be more open?

There's clearly some (economical) interest in keeping codes and data private, even for safety/privacy reasons. But such incentives are arguably overstated. On the opposite, it may be in YouTube's best interests to open more their codes and data. Below are a few arguments why, taken from HoangElmhamdi19FR.

The heart of all following arguments lies in the fact that social medias are natural monopolies because of the network effect. As a result, unlike in other businesses, the main threat to companies like Facebook and YouTube is probably not the competition. Rather, it is some major backlash caused by safety breach or ethical scandals.

In particular, as a result, the secrecy of the software and data of such companies does not seem to be the critical part of the business. Sure, this software must be very performant. But competitors will always be having a hard time, even if they access this software. After all, setting up the huge YouTube infrastructure just can't be done by some businesses, while acquiring the market share that YouTube has requires a lot more than just software and data.

On the other hand, sharing (intelligently) software and data could allow YouTube to gain ever more so the trust of users. This is critical. While today, YouTube's algorithm is not heavily criticized by mainstream medias, lack of efforts to show that YouTube is doing its best to make the algorithm robustly beneficial could turn into a major scandal. YouTube could anticipate such backlashes, by showing that they are collaborating with diverse independent research groups to make the YouTube algorithm beneficial.

Another reason why more open software and data could be fruitful is so as to attract talents. Today's young engineers and researchers are more and more concerned by ethical concerns, which led to recruitment difficulties for Facebook PoliticalWire19. What's more, many young scholars want to publicize their works, especially if it is research work.

Opening software and data could allow for collaborations with new insights. YouTube's data surely contain formidable social science insights, which could allow scientists to much better understand the society we live in. Open source code is also well-known to allow for numerous patchings that greatly increase the security of software HoepmanJacobs07.

Finally, there seems to be people in YouTube who are really motivated to make sure their work is beneficial to mankind. CEO Wojcicki seems to argue she is. Lê had numerous private discussions with YouTube employees that suggest they are. Happier and more motivated employees arguably often produce great work. Insisting on the motivation to do good could thus be a great way to make YouTube employees happy and productive.

It is noteworthy that these thoughts also suggest ways for both YouTube insiders and outsiders to increase YouTube's incentives to be more open, which seems very much desirable for AI ethics. Typically, we probably should demand a lot more from YouTube, and be unsatisfied with supposedly beneficial secret work done within the company. We may want to insist on the fact that the secrecy of YouTube's algorithms does not seem critical at all to YouTube's financial health, while a bad reputation could hinder YouTube adoption, recruitment and collaborations. We should make YouTube's current and future employees concerned by AI ethics, and tell them that it's okay to leave the company if they are not sufficiently convinced that their contributions are sufficiently ethical. And we probably should promote the economical value of openness.

Can the YouTube algorithm go AGI?

Part of the AI ethics community is particularly concerned with very powerful AIs, which are often referred to as artificial general intelligences (AGIs). Unfortunately, experts seem to disagree on the very meaning of AGI. Some experts argue that AGI is even meaningless.

A more interesting question is to determine whether YouTube can perform better-than-human planning to optimize its goal. While YouTube is nowhere near effective long-term capability, it is worth pointing out that the baseline comparison (human-level video recommendation) is extreme low. This is mostly because the speed of video recommendation necessary to satisfy billions of users is way beyond human-level. But also, it involves a huge amount of number crunching, as a user's next most relevant video to recommend depends on his watch history and on the enormous set of videos on the platform. By doing so, in a sense, the YouTube algorithm may have a better overview of what's happening on YouTube — especially given that much of what's going on YouTube is locked into private data. In a sense, YouTube is thus superhuman at many of its core tasks.

However, for effective long-term capability, the YouTube algorithm might need to better understand aspects of human psychology, sociology and economy that it is nowhere near understanding right now. Early research aims to go in this direction IJWNA+19. But could the YouTube algorithm eventually learn all of this?

It is hard to make predictions, especially about the future. And especially about the future of AI. But recent advances in natural language processing (like GPT2 RWAAC+18 Computerphile19a Computerphile19b) suggest that today's machine learning models might soon be able to crunch huge amounts of data to infer very sophisticated world models. Given the huge economical and moral incentives to improve YouTube's algorithms, there may be a chance that these algorithms might become the first algorithm with superhuman long-term capability.

The trouble with human-level planning capabilities is that any side effect will likely be amplified on unprecedented scales. Worse, such effects may be irreversible, as can be, say, a huge financial crisis, but on still bigger scales. One may even fear that such side effects may threaten the well-being of mankind; perhaps even its survival if global war gets triggered.