README
-------

The code has been written on an Ubuntu 18.04 machine with Java 10 and postgreSQL v9.6


Installation Instructions (with Eclipse):
-----------------------------------------
1. Extract the "codd-data-gen" folder from the downloaded zip. 
2. File->Import->Projects from Folder or Archive.
3. Select codd-data-gen folder.
4. Add resources folder, present inside codd-data-gen folder, in the build path. 
5. Add the lib folder, present inside the codd-data-gen folder, in the libraries path if not automatically added.

Note: If there is an exception concerning Z3, you may have to install it separately. (Install Z3 library configured for Java. (https://github.com/Z3Prover/z3).) Ensure the library is in the path.

PostgreSQL Setup:
-----------------
1. Create the desired database in potgreSQL v9.6.
2. Open the postresql.conf file in the database location and add the following lines:
	enable_bitmapscan=off
	enable_indexscan=off
	enable_nestloop=off
	enable_sort=off


Running Instructions:
---------------------
1. Open codd-data-gen/resources/cdgclient/postgres.properties file; update connection string, username, password and dbname to establish database connection.
2. The input to PiGen is given as query plans in JSON format. To run the queries and obtain the plans: (a) Add the input queries as SQL files in the codd-data-gen/recources/cdgclient/postgres/sqlqueries folder; (b) list these queries in the codd-data-gen/recources/cdgclient/postgres/sqlqueries/index file. A sample of three queries 8a_1.sql,8b_1.sql,8c_1.sql is present in the folder, along with the sample index file.
3. The query plans in JSON format are generated automatically in the codd-data-gen/recources/cdgclient/postgres/ea folder. List the plans to be given as input in the codd-data-gen/recources/cdgclient/postgres/ea/index file. A sample of three plans, 8a_1.json,8b_1.json,8c_1.json, is already present in the folder, along with a sample index file.
4. Execute codd-data-gen/src/in/ac/iisc/cds/dsl/cdg/main/Main.java


Query Assumptions:
------------------
1. There should not be a null value in any Foreign Key column. 
2. All joins must be PK-FK joins.
3. Filter and projection on non-key columns only.
4. Queries must be of type:  Select Distinct <...> from <tablenames> where <filter/join-predicates>.  


Output:
-------
The output of PiGen is stored in the "/home/dsladmin/Documents/codd-data-gen/output" folder. Each sub workload's output is stored in different folders. Each folder contains the following:
1. <tablename>_schema.sql: This is the anonymized schema ddl file.
2. anonymizedTableMap.txt: This stores the map of original table names and anonymized table names.
3. anonymizedColumnMap.txt: This stores the map of original column names and anonymized column names.
4. anonymized Queries folder: This folder contains the anonymized version of the sub workload queries.
5. summary_<tablename>: This is the table summary file in the binary format. This is available if the user opts for it at run time.//We need to update it to write it in text format.
6. <tablename>.csv: This is the entire table data. This is available if the user opts for it at run time.
Note: The table produced in the output is in the denormalized format. Further, PiGen produces anonymized outputs, where each data value is mapped from the its respective domain to a continuous numerical domain. The table names and column names are also changed. The queries, and schema are also constructed to be sensitive to these modifications, in the output.