Paths of explaining personalized recommendation algorithms
Alibaba Cloud’s exhibition station at the China International Software Expo in Nanjing, Jiangsu Province. Photo: CFP
With the promulgation of the E-Commerce Law of the People’s Republic of China, the Personal Information Protection Law (PIPL), and the recently issued Internet Information Service Algorithm Recommendation Management Regulations, a system for legal norms that facilitates algorithm governance has gradually taken form in China. Among various types of algorithms, algorithms for Personalized Recommender Systems (PRS) are most closely related to people’s daily life, therefore have triggered the most debate.
As more network platforms based on Web 2.0 technology emerge, the technological threshold for common users to produce and spread information has lowered significantly. As a result, the total volume of information in society has also experienced exponential growth. By contrast, hand collection or information distribution based on fixed rules (such as hits, timelines, and key words) can only handle “a drop in the ocean.” Although the quality of filtered information can be guaranteed, most incremental information is shelved simply because it cannot reach users. The 80/20 rule —in the information market—means that just because the supply side is increasing does not mean that more information will reach the users.
Strength and weaknesses
PRS algorithms emerged to cope with this exact problem. The service allows network service providers to present users with the information content most likely to be relevant to their needs, by drawing on user preference. This method of information distribution boosts the efficiency of suitable information with search demand, which also increases the total volume of information exposed per unit time, thereby creating conditions that activate more niche markets. Undoubtedly, when niche markets are activated, the total market capacity will also be greatly increased. Consequently, more jobs will be created, and the variety of products and services in the market will be improved as well.
For example, vloggers and We-Media all need PRS to precisely locate their targets to create profit. In addition, from the perspective of information receivers, PRS also helps users acquire valuable information efficiently by helping them save time and energy when searching online. Based on users’ existing preferences, a good PRS can even push the boundaries of relevant information within proper limits, in a way to explore or generate new information demand.
In an ideal scenario, a fully functional PRS algorithm can help form a sound pattern that works for all three parties: information providers, receivers, and platforms. However, this is not usually the case, where innate risks exist within PRS and cannot be overlooked.
Algorithm providers have the dominant right to make rules. In practice, due to algorithm imbalance caused by inadequate technical skills or priority of self-interest, some providers may hurt relevant parties or even public interest. A typical case is the “information cocoon,” an effect that results from information providers’ continuous recommendation of the same type of information to users, in order to gain stronger user stickiness.
As passive information receivers, users may not make meaningful choices. To protect users from flaws or defects inherent in algorithms, legislators endowed users with the right to explanation. By making algorithms’ operating mechanisms more transparent, legislators aim to give users the right to reflect on algorithms’ effects based on their own will. This can help correct the negative effects of PRS on users.
Optimizing PRS algorithms
The key to breaking the black box of algorithms and building trust for them, is to understand the basic principles and operating mechanism behind algorithms. The underlying logic of PRS algorithms can be summed up in the following three steps: collecting information (setting up a tagging system); calibrating user portraits, information, channels, and so forth (selecting the right crowd by mapping relationships between different tags); determining what information to provide (continuing to optimize the algorithm based on feedback).
First, PRS targets user crowds rather than individuals. The goal of utilizing PRS is to capitalize on the supply and demand pairing process, leading to conversions such as purchasing, higher video completion rates, and click rates. Only when there are enough actual conversions can information providers earn profits. Therefore, when setting up a PRS algorithm, it is necessary to make dynamic adjustments between accuracy and recall rates. It would be foolish to pursue only result accuracy, which will reduce market reach. Neither should the algorithm sacrifice accuracy to increase reach.
Second, tags come from a range of sources. A tagging system is the soul of a PRS algorithm. It is the primary guide for distributing information. Essentially, a user portrait is a tagging system created by conclusion and deduction of users’ identity information, social attributes, behavioral habits, preferences, geographical locations, and other data dimensions. Each tag represents a lens through which to understand users. To create a mapping relationship between information and users, PRS’s tagging system needs to draw a portrait of both users and the information itself, to eventually form a multi-dimensional and multifaceted tagging system.
Take the PRS algorithm for e-commerce platforms for example, the most common tags they use include dimension for products (such as products’ sub-categories, branding, price range, features of target users), the dimension for stores (including stores’ orientation, physical location, and store types), as well as the dimension for channels (including through searching, promotions, live-stream, and recommendations). In addition, external environmental features may also be included as a consideration factor that affect PRS results, including climate change, festivals, and big sporting events.
Third, PRS results come from mapping links among tags. The mapping rules are the final key to determining what information gets recommended. These rules may be based on the algorithm providers’ knowledge of society, the market, and social rules, such as correlation between clients who buy product A and probably are also interested in product B.
Based on certain demand, information publishers can make their own choices. For example, by using marketing tools, an advertiser can select target users in accordance with product features or a marketing campaign, to improve precision marketing. Of course, mapping rules are usually much more complicated in real life. Behind each recommended search result lies a permutation and combination of tags with various weights. Algorithm engineers need to take various indicators into account, including accuracy, coverage, variety, novelty, and timeliness, as they make dynamic adjustments to the tags’ weights and mapping rules according to actual outcomes.
According to Article 24 of PIPL, personalized information recommendation is categorized as using automated decision-making. Yet, unlike algorithms used in fields like credit investigation or credit, PRA algorithms have the same intended uses in most scenarios, namely, to distribute information to users. This means rights and obligations between users and other subjects do not directly change, which may directly affect users’ property rights and interests.
Therefore, the Internet Information Service Algorithm Recommendation Management Regulations proposed a management principle: classifying algorithms. Specifically, it means when setting up pathways to explain PRS algorithms, it is essential to take into full account the operating mechanism, usage scenarios, and the potential effects on users, algorithm service providers, and other subjects involved.
After all, algorithm interpretation is only a tool. It helps users to better understand the operational mechanism of algorithms, thereby empowering them to give effective feedback on the algorithm’s effect. Ultimately, trust can be built among multiple parties for win-win results. This is the ultimate goal of the right to interpret and refuse algorithms, as stipulated by legislators. Specifically, we need to focus on the following three dimensions.
The first dimension is data. If we compare an algorithm to a sophisticated machine, then data is the source of momentum for the machine. Therefore, disclosure of the type of data that PRS algorithms rely on, especially introduction of the use of personal information, will help users to understand the operational logic of an algorithm from the bottom-up. We can also address users’ concerns about the security of their personal information, as well as control the abuse of personal information by algorithm providers, by establishing a causal (or related) relationship between the results of the information feed and users authorized personal information.
The second dimension is decision-making. Users obtain reasonable expectations on the source of recommendation results mainly by disclosing the tagging system to users within a reasonable range and by explaining the mapping relationship between personalized recommendation results and tags. However, it should be clarified that as the tagging system is the core that supports the operation of PRS algorithms and contains trade secrets from algorithm service providers, if it is fully exposed to other market players or the public, it will not only cause direct damage to their commercial interests, but may also lead to malicious exploitation of algorithms and disrupt the normal business order. Therefore, when it comes to the disclosure of information about decision-making dimensions, algorithm service providers should be allowed to access desensitized information so as to concentrate on describing the basic operating logic of algorithms.
The third dimension is the outcome. Algorithm service providers can explain the feedback effect from users in a simple way, to enable users to understand the operational mechanism for PRS algorithms from differnt dimensions, i.e., before/during/after being recommended. For example, users may tag some content as “not relevant,” and require the algorithm to avoid similar products. The effect can be more specific, provided that trade secrets remain safe, such as “don’t recommend products at similar prices/of similar types/of similar brands.”
Liu Ming is a senior expert of the Research Center for Policies, Laws and Regulations at the Alibaba Group.
Edited by WENG RONG