Я, как потенциальный пользователь, люблю когда мне позволяют заморачиваться с деталями и не ограничивают в свободе выбора. Можете сделать опрос среди программистов, которые готовы использовать ваш продукт. Получите статистическое мнение по этому поводу. Уверен, найдутся и те, кому кроме OGG ничего не нужно и их не волнует поддержка со стороны некоторых браузеров.
Можно и без миррора. По дефолту, пул хранения создается вообще без избыточности, но тогда вы сможете только детектить ошибки, но не исправлять их. Можно сделать пул с избыточностью из двух разделов одного девайса.
верохний комментарий на youtube: «I have a test system running Opensolaris 2008.11 with 8 disks in a raidz2 pool. To test this, I unplugged the power on two of the disks at once(Unlike Sun's marketing dept., we can't justify the destruction of a disk here). The reads paused for about 5 seconds and then resumed without any intervention.
It appears that this bug was fixed between April and November 2008.
»
Договор нужен для юридической протекции сделки. Стрёмно работать с кем-то, когда не знаешь человека достаточное количество времени и даже не можешь гарантировать исполнение прав и обязательств по сделке.
В вашей задаче, вам не нужен ни MD5, ни автоинкрементный INT, как советовали выше.
Используйте в качестве первичного ключа бинарное поле и вписывайте в него 16-байтные UUID4-последовательности, а пользователям можете отдавать их в виде 32-символьной HEX-строки.
Про накладные расходы. По сравнению с int64, каким должен быть автоинкрементный счётчик, размер поля окажется больше всего на 8 байт (в 2 раза), что для вашего случая — совсем не проблема.
Убедитесь лишь, что ваши индексы всегда умещается целиком в физической памяти, а не хранятся частично или полностью в свопе на диске. Природа UUID такова, что он генерирует равномерно распределенные значения, а значит, поиск по такому индексу будет подвержен эффекту «случайного поиска» (random lookup). И если ваш индекс хотя бы частично хранится на диске, то это может привести к многочисленным random seeks и очень медленной выборке.
Чтобы иметь возможность ссылаться на этот тест впоследствии, продублирую статью из кеша.
— Murmur Hash: very fast and collision resistant?
* Nov 1, 2008
* Post a comment
Being interested in everything related to hashes then I was very pleased to recently come across MurmurHash 2.0 which is said to be faster and more collision resistant than all other publically available 32bit hashing algorithms.
The author, Austin Appleby, also makes an impressive claim:
«Murmur is close enough to random to be suitable for virtually all
keysets, though it does have the interesting property that it produces
0 collisions if the keyset is all 4-byte integers from 0 to 2^32-1 — this is a side effect of it doing mixes on the full 4-byte hash state
using only reversible operations.»
This claim is so impressive that I decided to test it. And here is the source code of the test program:
//-----------------------------------------------------------------------------
// MurmurHash2, by Austin Appleby
// Note — This code makes a few assumptions about how your machine behaves — // 1. We can read a 4-byte value from any address without crashing
// 2. sizeof(int) == 4
// And it has a few limitations — // 1. It will not work incrementally.
// 2. It will not produce the same results on little-endian and big-endian
// machines.
unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
{
// 'm' and 'r' are mixing constants generated offline.
// They're not really 'magic', they just happen to work well.
const unsigned int m = 0x5bd1e995;
const int r = 24;
// Initialize the hash to a 'random' value
unsigned int h = seed ^ len;
// Mix 4 bytes at a time into the hash
const unsigned char * data = (const unsigned char *)key;
while(len >= 4)
{
unsigned int k = *(unsigned int *)data;
k *= m;
k ^= k >> r;
k *= m;
h *= m;
h ^= k;
data += 4;
len -= 4;
}
// Handle the last few bytes of the input array
switch(len)
{
case 3: h ^= data[2] << 16;
case 2: h ^= data[1] << 8;
case 1: h ^= data[0];
h *= m;
};
// Do a few final mixes of the hash to ensure the last few
// bytes are well-incorporated.
unsigned char bit_masks[8] = {1,2,4,8,16,32,64,128};
unsigned int byte_stats[BYTE_STATS_MAX];
int main(void)
{
union {
char as_u08[24];
char as_s08[24];
unsigned int as_u32[6];
} key;
unsigned int hash32;
unsigned int i,j,k,l,m,n;
unsigned int byte_index;
unsigned int bit_index;
unsigned int bit_mask;
unsigned int collisions_so_far;
double dcollisions_so_far;
double di;
char * bits = malloc(TWO_POW_32_DIV_8);
collisions_so_far = 0;
for ( i = 0; i < BYTE_STATS_MAX; ++ i )
{
byte_stats[i] = 0;
}
for ( i = 0; i < (TWO_POW_32-1); ++ i )
//for ( i = 0; i < (500000001); ++ i )
{
key.as_u32[0] = i;
key.as_u32[1] = i;
key.as_u32[2] = i;
key.as_u32[3] = i;
hash32 = MurmurHash2 ( &key.as_s08[0], 4, (0xdeadbeef * 16) );
if ( hash32 < BYTE_STATS_MAX )
{
byte_stats[hash32] ++;
}
byte_index = hash32;
bit_index = hash32;
byte_index >>= 3;
bit_index &= 7;
bit_mask = bit_masks[bit_index];
if(((i % 10000000) == 0)
&& (i != 0))
{
di = i;
dcollisions_so_far = collisions_so_far;
printf("%10u collisions_so_far or %4.1f%% of %10u ", collisions_so_far, (dcollisions_so_far/di*100), i);
/* draw graph ;-) */
k = (dcollisions_so_far/di*100);
for ( j = 0; j < (k/2); ++ j )
{
printf(" ");
}
printf("*\n");
}
if((bits[byte_index] & bit_mask ) != 0)
{
collisions_so_far ++;
} else {
bits[byte_index] |= bit_mask;
}
}
for ( i = 0; i < BYTE_STATS_MAX; ++ i )
{
if ( byte_stats[i] == 0 )
{
printf("-");
}
else
{
if ( byte_stats[i] >= 10 )
{
printf("(%d)", byte_stats[i]);
}
else
{
printf("%d", byte_stats[i]);
}
}
}
}
And here's the output:
# gcc test-collisions-for-2pow32-032bit-keys.c -o test-collisions-for-2pow32-032bit-keys && ./test-collisions-for-2pow32-032bit-keys
0 collisions_so_far or 0.0% of 10000000 *
0 collisions_so_far or 0.0% of 20000000 *
0 collisions_so_far or 0.0% of 30000000 *
0 collisions_so_far or 0.0% of 40000000 *
0 collisions_so_far or 0.0% of 50000000 *
0 collisions_so_far or 0.0% of 60000000 *
0 collisions_so_far or 0.0% of 70000000 *
0 collisions_so_far or 0.0% of 80000000 *
0 collisions_so_far or 0.0% of 90000000 *
0 collisions_so_far or 0.0% of 100000000 *
…
0 collisions_so_far or 0.0% of 4000000000 *
0 collisions_so_far or 0.0% of 4010000000 *
0 collisions_so_far or 0.0% of 4020000000 *
0 collisions_so_far or 0.0% of 4030000000 *
0 collisions_so_far or 0.0% of 4040000000 *
0 collisions_so_far or 0.0% of 4050000000 *
0 collisions_so_far or 0.0% of 4060000000 *
0 collisions_so_far or 0.0% of 4070000000 *
0 collisions_so_far or 0.0% of 4080000000 *
0 collisions_so_far or 0.0% of 4090000000 *
0 collisions_so_far or 0.0% of 4100000000 *
0 collisions_so_far or 0.0% of 4110000000 *
0 collisions_so_far or 0.0% of 4120000000 *
0 collisions_so_far or 0.0% of 4130000000 *
0 collisions_so_far or 0.0% of 4140000000 *
0 collisions_so_far or 0.0% of 4150000000 *
0 collisions_so_far or 0.0% of 4160000000 *
0 collisions_so_far or 0.0% of 4170000000 *
0 collisions_so_far or 0.0% of 4180000000 *
0 collisions_so_far or 0.0% of 4190000000 *
0 collisions_so_far or 0.0% of 4200000000 *
0 collisions_so_far or 0.0% of 4210000000 *
0 collisions_so_far or 0.0% of 4220000000 *
0 collisions_so_far or 0.0% of 4230000000 *
0 collisions_so_far or 0.0% of 4240000000 *
0 collisions_so_far or 0.0% of 4250000000 *
0 collisions_so_far or 0.0% of 4260000000 *
0 collisions_so_far or 0.0% of 4270000000 *
0 collisions_so_far or 0.0% of 4280000000 *
0 collisions_so_far or 0.0% of 4290000000 *
1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
So the test program hashes 2^32 unique 32 bit input strings while detecting collisions and outputs results every 10 million iterations. We also capture a sample of 32bit hash value frequencies in the form of the first 4k of hash values and display these frequencies when the 2^32 loop is finished. If a hash value frequency is 0 (because it was never returned) then it is displayed as a dash '-', otherwise the frequency of the hash value is output. As expected, the output above shows 0 collisions during the 2^32 iterations and all 1s for the hash value frequencies.
Very nice I thought. So the cool thing about this hash algorithm is that it works on 32 bits of the input string to hash at a time. This is also one of the factors which makes it so fast. Most hashing algorithms only work on 8 bits of the input string at a time. A minority work on 16 bits of the input string at a time. Based on these initially very positive results I modified by test program to use bigger input strings. However, because I'm a very lazy developer I just duplicated the first 32 bits of the input string several times. Here is the diff showing how this new test program differs from the one above:
So now our input string to hash is 128 bits instead of 32 bits. The initial 32 bits of the input string to hash has been duplicated 4 times. And here is the output of the second test program:
# gcc test-collisions-for-2pow32-128bit-keys.c -o test-collisions-for-2pow32-128bit-keys && ./test-collisions-for-2pow32-128bit-keys
746316 collisions_so_far or 7.5% of 10000000 *
2758812 collisions_so_far or 13.8% of 20000000 *
5913827 collisions_so_far or 19.7% of 30000000 *
9966267 collisions_so_far or 24.9% of 40000000 *
14697559 collisions_so_far or 29.4% of 50000000 *
20078662 collisions_so_far or 33.5% of 60000000 *
25950773 collisions_so_far or 37.1% of 70000000 *
32254750 collisions_so_far or 40.3% of 80000000 *
38906168 collisions_so_far or 43.2% of 90000000 *
45898801 collisions_so_far or 45.9% of 100000000 *
…
3898620227 collisions_so_far or 97.5% of 4000000000 *
3908620227 collisions_so_far or 97.5% of 4010000000 *
3918620227 collisions_so_far or 97.5% of 4020000000 *
3928620227 collisions_so_far or 97.5% of 4030000000 *
3938620227 collisions_so_far or 97.5% of 4040000000 *
3948620227 collisions_so_far or 97.5% of 4050000000 *
3958620227 collisions_so_far or 97.5% of 4060000000 *
3968620227 collisions_so_far or 97.5% of 4070000000 *
3978620227 collisions_so_far or 97.5% of 4080000000 *
3988620227 collisions_so_far or 97.5% of 4090000000 *
3998620227 collisions_so_far or 97.5% of 4100000000 *
4008620227 collisions_so_far or 97.5% of 4110000000 *
4018620227 collisions_so_far or 97.5% of 4120000000 *
4028620227 collisions_so_far or 97.5% of 4130000000 *
4038620227 collisions_so_far or 97.6% of 4140000000 *
4048620227 collisions_so_far or 97.6% of 4150000000 *
4058620227 collisions_so_far or 97.6% of 4160000000 *
4068620227 collisions_so_far or 97.6% of 4170000000 *
4078620227 collisions_so_far or 97.6% of 4180000000 *
4088620227 collisions_so_far or 97.6% of 4190000000 *
4098620227 collisions_so_far or 97.6% of 4200000000 *
4108620227 collisions_so_far or 97.6% of 4210000000 *
4118620227 collisions_so_far or 97.6% of 4220000000 *
4128620227 collisions_so_far or 97.6% of 4230000000 *
4138620227 collisions_so_far or 97.6% of 4240000000 *
4148620227 collisions_so_far or 97.6% of 4250000000 *
4158620227 collisions_so_far or 97.6% of 4260000000 *
4168620227 collisions_so_far or 97.6% of 4270000000 *
4178620227 collisions_so_far or 97.6% of 4280000000 *
4188620227 collisions_so_far or 97.6% of 4290000000 *
--------------------(24)----------(40)----------------------------(20)-----------------------------------------------------------------------------------------------------------------------(88)--(32)---------4-----------------------------------------------------------------------------------------------8------8---------------------------------------(24)---------(52)-----------------------------(32)------------4---------------------------------------------------------------------------------------------------------(64)--------------------(180)--------------------------------------(36)----------------------4------(32)-----------------------------(12)------------------------------(26)--------(32)----------------------------------------8-----------------------------(16)----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------(32)-----------------------------(48)-----------------------------4--------------------------------(16)------8------------(36)----------------(148)--(64)-----------------------------8---------(68)-------------------8---------(40)-------8----------------------------------(52)-----(24)---------(96)---(44)--------------------------------------------(48)--------------------(32)---------(16)-------------------------------------------------8---------(16)---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------(16)--------------------------------------------------------------(40)-(12)--------------(112)----(16)-----------------(36)------------------------------------------------------------------------------------------------------------------------------------------------------------------------------(28)-------------(68)-----(16)----(12)--------(68)----------8----------------------------------(76)--------------------------------------------(56)-----------------(88)-(80)---------------------------------------8--------------------------------------------------------------------------------------------------------------------------------------------------(36)-------------------(36)-------------------------------------------------------------------------------(24)-----(32)-----------------------------------------------------4-------------------(12)----------------------------------------------------------------------------------------(40)-------------------(20)-------------------4-----------------(140)-------------------(56)---------8-----------------------------(16)-(124)--------(24)--------(32)-------------------------------------------------8-------------------(40)-----------------------------------------------------------------------------------------(68)---------------------------------------------------------------------------------------------------(32)------------------------------------------------------------------------(96)-----------------------------------------------------------(88)-------------------8--(24)-----------------------------2-----------------------------(32)---------(60)------4--(28)------4--------------------------------------------------------------------------------------------------------------------------------------(128)--------------------------(112)----------------------------------------(24)---------------------------------------8-----------------------------(88)-------------------8-------------------------------------------------(32)-------------------(28)---------(24)--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------(84)-------------------------------------------------(16)---------(76)-------------------(16)--------------------------------------------------------------------------------(12)-(40)-----------------(16)------------------------------(68)---------------------------------------------------------------(40)------------------------------
To my surprise the second program shows a 97.6% collision rate. The hash value frequency also reflects that many hash values in the first 4k range were never returned as a result. This result is soooo bad that I'm wondering if I made a mistake in the test programs somewhere. Maybe somebody else can re-write the short test and confirm the results?
А у книжного алгоритма были указаны какие-то сводные характеристики? Вот у того же Murmur:
«This version passes Jenkin's frog.c torture-test up to 2^25 keypairs (beats Hsieh's 2^17; previous Murmur failed after 2^9), distribution is still excellent (only 3 collisions on the SCOWL english-words.95 list vs. Hsieh's 41), and still runs faster than Hsieh and Jenkins at 1300 mb/sec vs. their ~900 mb/sec.»
только что заметил, что ссылка больше не действительна. Она была найдена через Википедию — en.wikipedia.org/wiki/MurmurHash#cite_note-14
Там был отличный тестовый набор с результатами. К сожалению и в кэше гугла не сохранилось…