Big data analysis has yet to mature
The more useful tags a piece of information contains, the more valuable it becomes, scholars say.
With the advent of the big data era, the validity of network data for the purpose of research has become the subject of much academic interest. Compared with traditional methodologies, such as surveys, the collection of data in the new era presents a common challenge in the social and natural sciences as researchers debate the best practices for cleaning and analyzing massive amounts of information to extract the real valuable content and provide clear policy guidance.
Data source
A high-quality data source is a prerequisite for authentic and reliable big data analysis, said Yu Guoming, a professor from the School of Journalism at Renmin University of China.
“Judging from the technical advancement at home and abroad, there are two main relatively authoritative and reliable sources: One is the government, which has access to multiple data sources. The other is large-scale corporations, which can gather data in a certain field, such as digital mobility, online shopping, social media, search engine and input software,” Yu said.
Yu said one important feature of the big data era is that it makes comprehensive data analysis possible. However, big data analysis carried out by a single department of the government or one corporation is rather confined, thus the conclusion is bound to be scattered and simplified, he added.
Technically, big data is not the private property of government or corporations. Instead, it is closely linked to individual rights and privacy, so it should belong to all of society, Yu continued.
Yu said that compared to data collection by government agencies and large Internet enterprises, some “big data” is derived from simple data mining and retrieval, so it can hardly live up to its name.
Li Yuxiao, director of the Institute of Internet Governance and Law at Beijing University of Posts and Telecommunications, said big data analysis is a sort of data product. “The measurement of its value lies in whether it meets the demand of specific customers,” Li said.
In an attempt to seize the opportunities created by the big data era, many institutions and individuals have been actively developing related software and products, which enhances the capacity of society to handle big data, Li said.
Data classification
A number of private organizations have applied analytical software to mine data from search engines, but the “big data” generated by these programs are usually incomplete and incapable of representing the overall situation, said Xie Yungeng, deputy director of the Institute of Arts and Humanities at Shanghai Jiao Tong University.
Though many institutes have been devoted to conducting big data analysis, solid research achievements have yet to be made. In Li’s opinion, big data analysis has yet to mature.
Yu said big data is gathered from different data groups, so in order to reflect a certain object or incident in a comprehensive way, associative analysis is of vital significance, which requires the involvement of tags.
“Some tags are natural. For example, it’s easy to identify tags such as age, gender, and occupation through personal files on social media. However, other tags need online behavior analysis. For example, by examining the social features of users’ online communication, they are categorized as early birds, late sleepers, fashion consumers, conservative consumers, high-income and low-income workers,” Yu said, adding that the more useful tags a piece of information contains, the more valuable it becomes.
Sampling survey
When the results of a traditional sampling survey are released, researchers must publish whether it followed the “21 rules,” such as data source, survey methods and sponsors, to guarantee that the report is not misleading. In the future, big data analysis also needs to establish a similar set of criteria that can attest to the quality of the data, Yu said.
In the meantime, though surveys are often small in scale and fragmented, the core value is the accurate structural analysis and the reliable statistical value, Yu said.
Li said though there is an increasing trend toward the use of big data, laws and regulations concerning social responsibility of enterprises and social norms of individuals have yet to be promulgated, while related laws, regulations and academic norms on data source, analysis, and conclusion calibration should also be introduced to make sure big data plays a greater role in national governance.
Zhang Junrong is a reporter from Chinese Social Sciences Today.