sample_midterm

E0 261 Sample Questions (MidTerm)

Instructions:

YOU SHOULD NOT DISCUSS THE QUESTIONS OR ANSWERS WITH ANYBODY ELSE UNDER ANY CIRCUMSTANCES, NOR EXCHANGE ANY MATERIAL. Clarifications, if necessary, should be sought directly from the instructor or the TAs.
Pick up one question paper, one stapled answer booklet, and a few loose sheets. The loose sheets should be used for your rough work and need not be returned. Your ``fair'' answers should be in the stapled answer booklet with only your name and SRNo on the front page. Write the answer to each question on a different page.
Spend the first half-hour reading all the questions carefully and fully. If a question is not clear, ask the instructor or the TAs for clarifications.
Write to-the-point and clearly explained answers. For questions that ask you to say YES or NO to a suggestion, first clearly write whether your answer is YES or NO, then give the explanation for your answer.
Since sufficient time is provided, it is expected that you will carefully check your answers (i.e. no ``silly mistakes'').

QUESTIONS:

Consider the following SQL query:
select * from EMP where EMP.salary 40000 AND EMP.age 30
Assume that there is a clustered B+ tree index () on EMP.salary and an unclustered B+ tree index () on EMP.age. The employee salaries range from 20001 till 100000 and the ages range from 21 till 60. The statistics are as follows:
NCARD = 100000 tuples, TCARD = 1000 pages, P = 0.8,
ICARD() = 1000 values, NINDX() = 500 pages, ICARD() = 40 values, NINDX() = 200 pages.
Assume that we are concerned only with the IO cost and that the buffer pool allocated to this query can hold 1200 pages.
1. Calculate the tuple cardinality estimated by System R for the query's result. (2 marks)
2. What access paths would be considered by the optimizer for the relation EMP? Which path is chosen? Show the cost calculations. (4 marks)
The Dean of IISc asks you to estimate the cardinality of the query
select distinct (S.name, S.hostel) from STUDENT S
where the STUDENT relation has the schema [SRNo, name, hostel, gpa] and individual -tree indexes are available on the SRNo and name attributes, respectively.
Assume that you use, for this estimation, the strategy outlined in Figure 1 of the Sampling paper discussed in class. Within this framework, describe your: (a) partitioning approach, (b) random sampling strategy, and (c) computation technique for the RandomSample() function.
[Hint: Consider the possibility of result partitions with fractional sizes.] (6 marks)
In the Linear Hashing paper, it is claimed that the the address computation algorithm A2 is extremely efficient since the while loop is on average executed at most twice. Do you agree? If yes, prove the claim. If no, indicate your assessment of the average number of iterations. (For simplicity, assume that the file construction parameters, and , are both set to 1.) (4 marks)
Consider the plan diagram shown in the following figure, annotated with the corner costs for each of the optimality regions. For this diagram, is it possible to identify the minimum $\lambda$ at which the reduced plan diagram (using Cost-Bounding approach) is guaranteed to feature only a single plan? If yes, identify (with justification) this minimum value, and also mention the associated plan. If not, explain why it is infeasible.

[Note: Assume plan cost monotonicity (PCM) is strictly adhered to by all plan cost functions.] (4 marks)
You are given the task of designing a range-encoded bitmap index on an integer attribute Weight that records the weight of a package to be mailed. The minimum possible weight is 101 grams and the maximum allowable weight is 1100 grams. Disk space is very scarce and you have only enough to store 40 bitmaps overall.
1. What is the minimum number of components required in the index? Give a set of base values that satisfy the space constraints. (3 marks)
2. What is the minimum number of components required to produce a time-optimal index that satisfies the space constraints? Give a set of base values for this index and calculate the expected time consumption incurred by the index. (3 marks)
Consider the ARIES recovery algorithm on the execution log shown below. The master record points to the checkpoint at LSN 10. Assume that the dirty page table written in the checkpoint record (at LSN 20) at the beginning is empty.

LSN LOG

00 update: T1 writes P1

10 begin-checkpoint

20 update: T1 writes P2

30 end-checkpoint Transaction Table: T1,

Dirty Page Table:

40 update: T2 writes P3

50 update: T3 writes P4

60 T1 commit

70 update T3 writes P5

80 T1 end

90 update T2 writes P3

100 T2 abort

CRASH, RESTART
1. At the end of the analysis phase, what are the contents of the transaction table and the dirty page table? Show the lastLSN field for the active transactions and the recLSN field for the dirty pages. (2 marks)
2. Show the first two CLRs written during the UNDO phase. Show the undoNextLSNs. (2 marks)
3. Due to some glitch in the power supply, the system keeps rebooting repeatedly and the recovery algorithm runs on each restart. Specify the maximum number of log records that may be cumulatively written during recovery over all the restarts. (2 marks)