MASK: Mining
Associations with
Secrecy Konstraints
Database Systems Lab Indian Institute of Science | ||||||
| ||||||
| ||||||
About MASK
| ||||||
Data mining services require accurate input data for their results to be meaningful, but privacy concerns may influence users to provide spurious information. We have investigated, with respect to mining association rules, whether users can be encouraged to provide correct information by ensuring that the mining process cannot, with any reasonable degree of certainty, violate their privacy. Specifically, we have developed a scheme called MASK, based on probabilistic distortion of user data, that can simultaneously provide a high degree of privacy to the user and retain a high level of accuracy in the mining results. The performance of the scheme has been validated against representative real and synthetic datasets. A special feature of MASK is that the distortion process can be implemented at the user machine itself without having to trust any intermediate third party. In followup work, we have found that mining the distorted database can be orders of magnitude more time-consuming as compared to mining the original database. To address this problem, we have developed an algorithm called EMASK which by (a) generalizing the distortion process to perform symbol-specific distortion, (b) appropriately chooosing the distortion parameters, and (c) applying a variety of optimizations in the reconstruction process, provides runtime efficiencies that are well within an order of magnitude of undistorted mining. In our latest effort, we present a generalized matrix-theoretic model of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, we demonstrate that (a) the prior techniques differ only in their settings for the model parameters, and (b) through appropriate choice of parameter settings, we can derive new perturbation techniques that provide highly accurate mining results even under strict privacy guarantees. We also propose a novel perturbation mechanism wherein the model parameters are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at a very marginal cost in accuracy. Our experimental results indicate that our mechanisms incur substantially lower identity and support errors as compared to the prior techniques. | ||||||
Publications | ||||||
A Framework for High-Accuracy Privacy-Preserving Mining
S. Agrawal and J. Haritsa Proc. of 21st IEEE Intl. Conf. on Data Engineering (ICDE), Tokyo, Japan, April 2005 On Addressing Efficiency Concerns in Privacy-Preserving Mining S. Agrawal, V. Krishnan and J. Haritsa Proc. of 9th Intl. Conf. on Database Systems for Advanced Applications (DASFAA), Jeju Island, Korea, March 2004, pgs. 113-124 Maintaining Data Privacy in Association Rule Mining S. Rizvi and J. Haritsa Proc. of 28th Intl. Conf. on Very Large Data Bases (VLDB), Hong Kong, China, August 2002, pgs. 682-693 | ||||||
Reports and Theses | ||||||
A Framework for High-Accuracy Privacy-Preserving Mining
S. Agrawal and J. Haritsa Technical Report TR-2004-02, DSL/SERC Providing Efficiency in Privacy-Preserving Mining S. Agrawal, V. Krishnan and J. Haritsa Technical Report TR-2003-03, DSL/SERC High Performance Algorithms for Privacy-Preserving Mining Shipra Agrawal Master's Thesis, July 2004 | ||||||
Download MASK | ||||||
Code for the EMASK (and MASK) privacy-preserving association rule mining algorithms. | ||||||
People Involved | ||||||
| ||||||
Contact | ||||||
Email: haritsa [AT] dsl [dot] serc [dot] iisc [dot] ernet [dot] in | ||||||
Acknowledgments | ||||||
This work is supported in part by a Swarnajayanti Fellowship from the Dept. of Science & Technology, Govt. of India. |