MASK: Mining Associations with Secrecy Konstraints

Database Systems Lab
Indian Institute of Science



[About] [Publications] [Download] [People] [Contact] [Acknowledgements]

About MASK

Data mining services require accurate input data for their results to be meaningful, but privacy concerns may influence users to provide spurious information. We have investigated, with respect to mining association rules, whether users can be encouraged to provide correct information by ensuring that the mining process cannot, with any reasonable degree of certainty, violate their privacy. Specifically, we have developed a scheme called MASK, based on probabilistic distortion of user data, that can simultaneously provide a high degree of privacy to the user and retain a high level of accuracy in the mining results. The performance of the scheme has been validated against representative real and synthetic datasets. A special feature of MASK is that the distortion process can be implemented at the user machine itself without having to trust any intermediate third party.

In followup work, we have found that mining the distorted database can be orders of magnitude more time-consuming as compared to mining the original database. To address this problem, we have developed an algorithm called EMASK which by (a) generalizing the distortion process to perform symbol-specific distortion, (b) appropriately chooosing the distortion parameters, and (c) applying a variety of optimizations in the reconstruction process, provides runtime efficiencies that are well within an order of magnitude of undistorted mining.

In our latest effort, we present a generalized matrix-theoretic model of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, we demonstrate that (a) the prior techniques differ only in their settings for the model parameters, and (b) through appropriate choice of parameter settings, we can derive new perturbation techniques that provide highly accurate mining results even under strict privacy guarantees. We also propose a novel perturbation mechanism wherein the model parameters are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at a very marginal cost in accuracy. Our experimental results indicate that our mechanisms incur substantially lower identity and support errors as compared to the prior techniques.


Publications

A Framework for High-Accuracy Privacy-Preserving Mining
S. Agrawal and J. Haritsa
Proc. of 21st IEEE Intl. Conf. on Data Engineering (ICDE), Tokyo, Japan, April 2005

On Addressing Efficiency Concerns in Privacy-Preserving Mining
S. Agrawal, V. Krishnan and J. Haritsa
Proc. of 9th Intl. Conf. on Database Systems for Advanced Applications (DASFAA), Jeju Island, Korea, March 2004, pgs. 113-124

Maintaining Data Privacy in Association Rule Mining
S. Rizvi and J. Haritsa
Proc. of 28th Intl. Conf. on Very Large Data Bases (VLDB), Hong Kong, China, August 2002, pgs. 682-693

Reports and Theses

A Framework for High-Accuracy Privacy-Preserving Mining
S. Agrawal and J. Haritsa
Technical Report TR-2004-02, DSL/SERC

Providing Efficiency in Privacy-Preserving Mining
S. Agrawal, V. Krishnan and J. Haritsa
Technical Report TR-2003-03, DSL/SERC

High Performance Algorithms for Privacy-Preserving Mining
Shipra Agrawal
Master's Thesis, July 2004

Download MASK

Code for the EMASK (and MASK) privacy-preserving association rule mining algorithms.

People Involved

  • Jayant Haritsa
  • Shipra Agrawal
      Alumni:
  • Vijay Krishnan
  • Shariq Rizvi

Contact

Email: haritsa [AT] dsl [dot] serc [dot] iisc [dot] ernet [dot] in

Acknowledgments

This work is supported in part by a Swarnajayanti Fellowship from the Dept. of Science & Technology, Govt. of India.