CODD Metadata Processor



Database Systems Lab
Indian Institute of Science



[About] [Download] [Publications] [Team] [Contact]

About CODD
Welcome to the CODD Metadata Processor software developed at the Database Systems Lab, Indian Institute of Science.

CODD is an easy-to-use graphical tool for the automated creation, verification, retention, scaling and porting of database meta-data configurations. It is written entirely in Java and is operational on a suite of industrial-strength database engines -- currently, DB2, Oracle, SQL Server, SQL-MX and PostgreSQL are supported.

The effective design and testing of database engines and applications is predicated on the ability to easily construct alternative scenarios with regard to the database contents. A limiting factor, however, is that the time and/or space overheads incurred in creating and maintaining these databases may render it infeasible to model the desired scenarios. CODD attempts to alleviate these difficulties through the construction of “dataless databases”. Specifically, it implements a unified visual interface through which databases with the desired metadata characteristics can be efficiently simulated without persistently generating and/or storing their contents.

The following metadata processing modes are available in CODD:
  • Metadata Construct Mode
  • In this mode, the user can create metadata statistics from scratch without requiring the existence of any corresponding prior database instance. Apart from a form-based interface to enter the metadata values, a graphical interface is also provided for visually specifying the data distributions of the column values in the relational tables.

    Specifically, the editable meta-data is comprised of statistics on the following entities: (a) relational tables (row cardinality, row length, number of disk blocks, etc.); (b) attribute columns (column width, number of distinct values, value distribution histograms, etc.); (c) attribute indexes (number of leaf blocks, clustering factor, etc.); and (d) system parameters (sort memory size, CPU utilization, etc.)

    The metadata entered by the user is automatically validated for legality (correct type, valid range) and consistency (compatibility with other meta-data values).

    Note: The Construct Mode is not currently available for SQL Server since it stores column distribution statistics in a proprietary internal format.

  • Metadata Retention Mode (aka Data Drop Mode)
  • This mode applies to pre-existing databases and allows the user to fully reclaim the storage currently occupied by the database instance without triggering any changes in the associated metadata.

  • Inter-Engine Metadata Transfer Mode
  • This mode aids in the automated transfer of metadata across different relational engines (e.g. from DB2 to Oracle), thereby facilitating the comparative study of alternative database offerings. Since the engines are not entirely compatible in their configurations, only a best-effort transfer is provided, with the remaining information to be explicitly entered by the user.

  • Metadata Scaling
  • A common activity in database engine testing exercises is to assess the behavior of the system on scaled versions of the original database. To cater to this need, standard industry decision-support benchmarks such as TPC-H and TPC-DS are available in a variety of scale factors. They essentially linearly scale the cardinalities of the user relations to provide a database of the desired size.

    CODD supports the above-mentioned space-based scaling. In addition, it also incorporates a novel time-based scaling feature. Specifically, given a query workload and a scale factor α, the relational cardinalities are scaled such that the optimizer's estimated total cost of executing on the new database is scaled by α. As a secondary objective, it is also attempted, as far as possible, to have the execution time of each individual query in scaled by α.

The name of the CODD tool stems from the English word cod, whose archaic meanings include “empty shell” and “fake”, both of which are appropriate to our dataless context. It also coincidentally happens to be the name of Edgar Codd, the father of relational databases.


Download
Note: The Indian Institute of Science is the copyright owner on the CODD software, as per this   Copyright Certificate   issued by Govt. of India, on September 10, 2013. Downloading the CODD software automatically implies that you accept and acknowledge this copyright ownership by the Indian Institute of Science.


Publications

CODD: COnstructing Dataless Databases
R. Trivedi, I. Nilavalagan and J. Haritsa
Proc. of 5th Intl. Workshop on Testing Database Systems (DBTest), Scottsdale, USA, May 2012

CODD: A Dataless Approach to Big Data Testing (demo)
Ashoke S. and J. Haritsa
Proc. of 41st Intl. Conf. on Very Large Data Bases (VLDB), Hawaii, USA, September 2015
published as
PVLDB Journal vol. 8, no. 12, pgs. 2008-2011, August 2015


Contact
Email: codd [AT] dsl [dot] cds [dot] iisc [dot] ac [dot] in

Primary CODD Contributors (in chronological order of participation)

  • Jayant Haritsa (Project Lead)
  • Rakshit Trivedi (PA)
  • Nilavalagan I (ME, CSA, IISc)
  • Deepali Nemade (ME, CSA, IISc)
  • Ankur Gupta (ME, CSA, IISc)
  • Ashoke S (ME, CSA, IISc)
  • Anupam Sanghi (ME, CSA, IISc)
  • Raghav Sood (ME, CSA, IISc)