Introduction
Index only scans is an access method when information is retrieved
from the table using only information in the index tree without having
to do an additional seek to read the actual row.
It's a well known concept that is implemented in mysql,
Oracle, Pg 9.2 has patches.
So I've decided to see for myself how big impact can be.
Goal
We have a big table 'CGM' with g, m and disabled
columns. Users table with id and name.
We want to find all users who are member of a group X:
SELECT u.* FROM CGM JOIN Users u ON u.id = CGM.m
WHERE CGM.g = X
Sometimes we add 'disabled = 0' condition to the query.
I wrote a script in perl that performes profiling.
First results
First results looked like this:
Rate g g,disabled g,m g,m,disabled
g 13307/s -- -1% -3% -3%
g,disabled 13474/s 1% -- -2% -2%
g,m 13714/s 3% 2% -- -0%
g,m,disabled 13714/s 3% 2% -0% --
Not impressive, but it's clear that there is no I/O at all and script
is testing CPU intensive task.
Impressive results
When I/O comes to scene situation is completly different. To achive this
I've lowered InnoDB buffer pool size to its minimum (1M)
Rate g g,disabled g,m,disabled g,m
g 5531/s -- -0% -57% -58%
g,disabled 5547/s 0% -- -57% -57%
g,m,disabled 12799/s 131% 131% -- -2%
g,m 13033/s 136% 135% 2% --
In above test we don't check disabled column and two indexes are suitable. Next
test adds condition on disabled column.
Rate g g,m g,disabled g,m,disabled
g 5531/s -- -0% -5% -57%
g,m 5547/s 0% -- -4% -57%
g,disabled 5802/s 5% 5% -- -55%
g,m,disabled 12800/s 131% 131% 121% --
Memory considerations
To cover with indexes all possible access patterns on CGM table we will need
two indexes: (g, m, disabled) and (m, g, disabled). The latter required when
query starts from users table and fetches groups. This means that you need
additional space.
Let's asume we use 4 bytes long integers for all three columns in CGM. Row
size is 12 bytes. The following situations are possible:
From the list, one can assume that covering index needs 1.8 times more
memory than minimal indexing. However, mysql don't need to access table,
so 36MB turns into 24MB. It's 20% more than minimal.
Conclusions
As you could see covering indexes improve performance when data doesn't
fit into memory. Otherwise benefit is close to zero. Longer indexes need
more memory, but this additional footprint can be lowered.
Recomendations would be. Don't bother as long as data fits into memory.
Create only required indexes at this point, smaller indexes would give
you more room to grow until you hit memory limit. If you decided to use
covering idexes then cover all possible access patterns or you will
need additional space in buffer pool for table data.
That's it.