Provenance is concerned with causal dependencies between data and the process around it that contributes to its existence in a specific state. It can be used to make determination whether information can be trusted, integrate with another data source, support accountability, and give a credit for the originator. In essence, the notable uses of Provenance are as follows:
- Reliability and quality -> given a derived dataset we are able to cite its lineage and therefore measure its credibility;
- Justiﬁcation and audit -> give a historical account of when and how data has been produced;
- Re-usability, reproducibility and repeatability -> a provenance record not only shows how data has been produced, it provides all the necessary information to reproduce the results;
- Ownership, security, credit and copyright -> who the information belongs to.
If Provenance data is available, processing becomes transparent. Therefore, by looking at data provenance, it becomes possible to judge and assess the quality and derived trustworthiness of the data. This is one of the main benefits of applying provenance, to build Trust. Besides to build trust, provenance is used as a way to track a dependency in order to get a certain explanation. Dependency tracking is the process of maintaining dependency sets for derived conclusions. To make provenance more trustful, Provenance has to be stored separately from its origin resource.
In nowadays situation, people interact with each other and are confronted with various information from multiple and unfamiliar sources. Therefore, the need of Trust is necessary in order to achieve a decision point. Trust is emerged when 2 entities are interacting with each other. For example, when an article is linked to another article, a certain Trust is also implied there. It comprises 3 concepts, namely: expectancy, belief, and willingness to be vulnerable. In addition, trust is also transitive, meaning that it can be transferred from 1 entity to another entity.
Trust is closely related to reputation in a sense that reputation can determine our trust toward something. Reputation can be defined as a collective measure of trustworthiness which is believed as a person’s character or standing. By looking at that definition, the concept to reputation is too general. Trust, on the other hand, is a subjective measurement depending on who is the evaluator or trustor. Consider this illustration about the different between trust and reputation:
(1) “I trust you because of your good reputation.”
(2) “I trust you despite your bad reputation.”
Another important aspect in Provenance is Accountability. It is a one aspect to support the idea of trust in provenance. Information has highly accountability if it can be used transparently so that it is possible to determine whether its use is appropriate in a given set of rules. Information accountability can be achieved when we are able to make a better use of the information we have collected and retain that information to that is necessary to hold data users responsible for policy compliance. Therefore, it is important to make acts of information usage more transparent in order to hold individuals and institutions who misuse it accountable for their acts. Although transparency and accountability make information is more visible, we need to define a set of rules that limit the harmful use of personal information.
Some authors have created a concept of accountability which aims to maintain provenance information and able to compute believability based on it. Believability depends highly on the origin of its data (data lineage) and the processes along that make the data into existence. In other words, the quality of data and information provided is crucial to build user’s trust and ultimately create a certain satisfaction. Some attributes contribute in data quality are accuracy, timeliness, precision, reliability, currency, completeness, relevancy, accessibility and interpretability. Figure 1 depicts the framework of data quality which is used in most research study. In addition, as well as trust, believability can be transferred along the transformation chain of data value.
Figure 1. Data Quality Framework
Weitzner, D. J., Abelson, H., Berners-Lee, T., Feigenbaum, J., Hendler, J., & Sussman, G. J. (2008). Information accountability. Communications of the ACM, 51(6), 82-87.
Prat, N., & Madnick, S. (2008, January). Measuring data believability: A provenance approach. In Hawaii International Conference on System Sciences, Proceedings of the 41st Annual (pp. 393-393). IEEE.
Aldeco Perez, R., & Moreau, L. (2008). Provenance-based auditing of private data use.
Huang, J., & Fox, M. S. (2006, August). An ontology of trust: formal semantics and transitivity. In Proceedings of the 8th international conference on Electronic commerce: The new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internet (pp. 259-270). ACM.
McGuinness, D. L., Zeng, H., Da Silva, P. P., Ding, L., Narayanan, D., & Bhaowal, M. (2006). Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study. MTW, 190.
Szomszor, M., & Moreau, L. (2003). Recording and reasoning over data provenance in web and grid services. In On the move to meaningful Internet systems 2003: CoopIS, DOA, and ODBASE (pp. 603-620). Springer Berlin Heidelberg.
Jøsang, A., Ismail, R., & Boyd, C. (2007). A survey of trust and reputation systems for online service provision. Decision support systems, 43(2), 618-644.
Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of management information systems, 5-33.