mysqlcamp - The Password is c4mp

 

Session_Bitwise

Page history last edited by Sheeri Kritzer 2 yrs ago

Lots of boolean flags or small sets rendering your indexes useless? This workshop is for you!

 

Bring your laptops! This will be a work session. We'll work on finding the best way to deal with optimizing a search across many fields that are booleans/small sets. The data we'll use contains approximately 45 booleans/small sets.

 

We will:

 

1) Create an algorithm for building search queries for a non-trivial number of small sets or boolean flags and get some rough time estimates for simple and complex queries.

 

2) Translate those flags to numbers, given the # of possible values for each field (2 for boolean flags, <10 or so for small sets of data).

 

3) Create an algorithm for building new search queries, using bitwise operators to build complex queries.

 

If necessary, we'll split up into groups.

 


For example, let's say you have 10 boolean flags, each equally searched upon singularly or in group (let's say they're personality traits in a dating database, if you need a real example):

 

1) result would be "SELECT id FROM tbl WHERE one=0 AND two=1 AND three=0. . .". Actually, it'd be more like "the code that would produce this" -- could be pseudocode, could be perl, could be PHP, whatever is lightweight enough on a system that can be presentable (if Java meets that for you, go for it! Python, Ruby, whatever).

 

2) Translation: each flag gets turned into a 10-bit number. There would have to be some kind of explanation of which position is which field -- order doesn't

 

3) result would be "SELECT id FROM tbl WHERE overall=32" (and some more complex bitwise code, of course....if I had the answers, there'd be no need for this workshop!) I'll see if I can post an example before the conference. And of course, the algorithm -- in easy-to-read code or pseudocode.

 

Questions? Comments? Suggestions?

 

I have used the above technique (years ago for a different reason) and it seemed to work but it gets difficult as the number of values gets lawrge and it won't work for OR ( beautiful=1 OR rich=1 ...)

 

I have two other techniques which I can share, these have the advantage that they can also search data in other tables so to stay with your example if there were an `interest` table with values like 'hunting', 'fishing', 'opera', 'dancing', 'sports', 'shopping', ... and a `personInterest` holding the personID, interestID you'd be able to include this in your search:

 

- use sub queries, one for each condition - this works but will not scale (so I guess that means it doesn't work)

 

- use a temporary table; insert the results of each (former) sub query into it and join on that table. This works surprisingly well even for very large results sets and can even be used to score results.

 

We should post a simple schema and we can try these techniques side by side.

 

Data:

raw data

 

uid

DONE 5 booleans (2 values each, 0 or 1)

 

eyeColor = one of "0,1,20,30,40,50,60,70" 8 values

hairColor = one of "0,1,5,20,30,40,50,60,70,80,90,95,99" 13 values

status = one of "new, active, hold, delete, modified" 5 values

type = one of "admin, free, limited, paid, preview, org" 6 values

 

CREATE TABLE `rawData` (

`id` int(10) unsigned NOT NULL,

`i1` tinyint(1) NOT NULL,

`i2` tinyint(1) NOT NULL,

`i3` tinyint(1) NOT NULL,

`i4` tinyint(1) NOT NULL,

`i5` tinyint(1) NOT NULL,

`eyeColor` tinyint(3) unsigned NOT NULL,

`hairColor` tinyint(3) unsigned NOT NULL,

`status` varchar(8) default NULL,

`type` varchar(7) default NULL,

PRIMARY KEY (`id`)

);

 

cast to bit value: bin()+0

 

load data local infile '/home/sheeri/bitData.tab' into table rawData;

 

when converting booleans to the bit string, shift by a multiple of 2, because each value comprises 2 bits.

 

select i1,i2,i3,i4,i5,eyeColor,

(1<bin((1<case when eyeColor=0 then 1<<10

when eyeColor=1 then 1<<11

when eyeColor=20 then 1<<12

when eyeColor=30 then 1<<13

when eyeColor=40 then 1<<14

when eyeColor=50 then 1<<15

when eyeColor=60 then 1<<16

when eyeColor=70 then 1<<17 END as num,

bin(case when eyeColor=0 then 1<<10

when eyeColor=1 then 1<<11

when eyeColor=20 then 1<<12

when eyeColor=30 then 1<<13

when eyeColor=40 then 1<<14

when eyeColor=50 then 1<<15

when eyeColor=60 then 1<<16

when eyeColor=70 then 1<<17 END)+0 as bitString

from rawData limit 5;

 

 

awfief at gmail dot com

Comments (0)

You don't have permission to comment on this page.