MIRA: Multilingual Information Processing on
Relational Architecture
Database Systems Lab
Indian Institute of Science

[About MIRA] [Publications] [People] [Contact]

About MIRA

In today's global village, it is critical that the key information tools, such as web search engines, e-Commerce portals and e-Governance, work across multiple natural languages, seamlessly. We propose a new multilingual architecture - Multilingual Information processing on Relational Architecture (MIRA) - that supports the multilingual processing functionality of the primary storage mechanism for such deployments - the relational database systems, effectively and efficiently. Our proposed architecture is based on standard components and hence amenable for easy implementation in any type of query processing and information retrieval systems.

In this project, we first analysed the performance of a set of popular commercial database management systems to profile their performances on multilingual queries. While the systems exhibited skewed differential performance, our proposed simple compression technique - called Cuniform made the performances of the systems, nearly language-neutral. Feature-wise, we proposed two new linguistic matching operators - PhonEQUAL and SemEQUAL - that extended and complemented the standard lexicographic matching of database systems with phonetic and semantic matching capabilities. Our outside-the-server of the proposed operators on on a host of comercial database systems had shown that while the default performance of the operators are unacceptable, they may be speeded-up with proper tuning of schema and access structures, to a level acceptable for on-line user interaction. Further, such an implementation can be easily deployed on existing commercial database systems, with their current SQL:1999 capabilities.

Subsequently, we formulated an enhanced query algebra, incorporating the new multilingual operators, for an inside-the-server implementation of the multilingual features. We specified the cost models and selectivity estimations to make such an implementation truly native to the relational database management system. We implemented the operators natively on PostgreSQL open-source database management system and demonstrated the efficiency and the optimization opportunities that such an implementation affords.

In the future, we hope to pursue a host of research issues that are opened up due to the inherently fuzzy nature of the alternative matching semantics.

Acknowledgements: This work is supported in part by a Swarnajayanthi Fellowship from the Government of India.
We thank Prof. Pushpak Bhattacharyya of IIT-Bombay and Prof. Shalini Urs of University of Mysore for providing us with linguistic resources and data sets.


Publications

  • SemEQUAL: Multilingual Semantic Matching in Relational Systems
    A. Kumaran and J. R. Haritsa

  • Proceedings of the 10th International Conference on Database Systems for Advanced Applications (DASFAA 2005), Beijing, China, April 2005. Published in the Lecture Notes in Computer Science Series by Springer-Verlag, LNCS Vol. 3453, April 2005.

  • On Semantic Matching of Multilingual Attributes in Relational Systems (poster paper)
    A. Kumaran and J. R. Haritsa

  • Proceedings of the 13th Conference on Information and Knowledge Management (CIKM 2004), Washington, DC, USA, November 2004. Published by ACM, November 2004.

  • MIRA: Multilingual Information processing on Relational Architecture
    A. Kumaran

  • An expanded version of the EDBT-PhD Workshop Paper, invited to be a part of the Current Trends in Database Technology volume, edited by Wolfgang Lindner et. al. Published in the Lecture Notes in Computer Science Series by Springer-Verlag, LNCS Vol. 3268, 2004.

  • LexEQUAL: Multilexical Matching Operator in SQL (demo)
    A. Kumaran and J. R. Haritsa

  • Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2004), Paris, France, June 2004. Published by ACM, June 2004.

  • MIRA: Multilingual Information processing on Relational Architecture
    A. Kumaran

  • Proceedings of the joint ICDE - EDBT 2004 PhD Workshop, Heraklion-Crete, Greece, March 2004. Published by Crete University Press, March 2004.

  • LexEQUAL: Supporting Multiscript Matching in Database Systems
    A. Kumaran and J. R. Haritsa

  • Proceedings of the IX'th European Extending Database Technology Conference (EDBT 2004), Heraklion-Crete, Greece, March 2004. Published in the Lecture Notes in Computer Science Series by Springer-Verlag, LNCS Vol. 2992, March 2004.

  • Supporting Multilexical Queries in SQL (poster paper)
    A. Kumaran and J. R. Haritsa

  • Proceedings of the 20th IEEE's International Conference on Data Engineering (ICDE 2004), Boston, Massachusetts, USA, March 2004. Published by IEEE Computer Soceity, March 2004.

  • On the Costs of Multilingualism in Database Systems
    A. Kumaran and J. R. Haritsa

  • Proceedings of 29th Very Large Databases Conference (VLDB 2003), Berlin, Germany, September 2003. Published by Morgan-Kauffman, September 2003.

  • On Database Support for Multilingual Environments
    A. Kumaran and J. R. Haritsa

  • Proceedings of 13th IEEE's Research Issues in Data Engineering (RIDE 2003) Workshop on Multilingual Information Management (a part of ICDE 2003), Bangalore/Hyderabad, India, March 2003. Published by IEEE Computer Soceity, March 2003.

People

  • Jayant Haritsa
  • Rupesh Bajaj
      Alumni:
  • A. Kumaran
  • Pavan Kumar Chowdary

Contact

haritsa [AT] dsl [dot] serc [dot] iisc [dot] ernet [dot] in