MySQL और NoSQL:सही चुनने में मेरी मदद करें

आपको निम्नलिखित को पढ़ना चाहिए और एक अच्छी तरह से डिज़ाइन की गई इनोडब टेबल के फायदों के बारे में थोड़ा सीखना चाहिए और क्लस्टर इंडेक्स का सबसे अच्छा उपयोग कैसे करना चाहिए - केवल innodb के साथ उपलब्ध!

http://dev.mysql.com/doc /refman/5.0/hi/innodb-index-types.html

http://www. xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/

फिर अपने सिस्टम को निम्नलिखित सरलीकृत उदाहरण की तर्ज पर कुछ डिज़ाइन करें:

उदाहरण स्कीमा (सरलीकृत)

महत्वपूर्ण विशेषताएं यह हैं कि टेबल इनोडब इंजन का उपयोग करते हैं और थ्रेड टेबल के लिए प्राथमिक कुंजी अब एक ऑटो_इनक्रिमेंटिंग कुंजी नहीं है बल्कि एक समग्र क्लस्टर है। फोरम_आईडी और थ्रेड_आईडी के संयोजन के आधार पर कुंजी। उदा.

threads - primary key (forum_id, thread_id)

forum_id    thread_id
========    =========
1                   1
1                   2
1                   3
1                 ...
1             2058300  
2                   1
2                   2
2                   3
2                  ...
2              2352141
...

प्रत्येक फ़ोरम पंक्ति में एक काउंटर शामिल होता है जिसे next_thread_id (अहस्ताक्षरित int) कहा जाता है, जिसे एक ट्रिगर द्वारा बनाए रखा जाता है और हर बार किसी दिए गए फ़ोरम में एक थ्रेड जोड़ा जाता है। इसका मतलब यह भी है कि अगर हम थ्रेड_आईडी के लिए एकल auto_increment प्राथमिक कुंजी का उपयोग करते हैं, तो हम कुल 4 बिलियन थ्रेड्स के बजाय प्रति फ़ोरम 4 बिलियन थ्रेड स्टोर कर सकते हैं।

forum_id    title   next_thread_id
========    =====   ==============
1          forum 1        2058300
2          forum 2        2352141
3          forum 3        2482805
4          forum 4        3740957
...
64        forum 64       3243097
65        forum 65      15000000 -- ooh a big one
66        forum 66       5038900
67        forum 67       4449764
...
247      forum 247            0 -- still loading data for half the forums !
248      forum 248            0
249      forum 249            0
250      forum 250            0

कंपोजिट कुंजी का उपयोग करने का नुकसान यह है कि अब आप केवल एक ही कुंजी मान के आधार पर एक थ्रेड का चयन नहीं कर सकते हैं:

select * from threads where thread_id = y;

आपको करना होगा:

select * from threads where forum_id = x and thread_id = y;

हालांकि, आपके एप्लिकेशन कोड को पता होना चाहिए कि उपयोगकर्ता किस फ़ोरम को ब्राउज़ कर रहा है, इसलिए इसे लागू करना बिल्कुल मुश्किल नहीं है - वर्तमान में देखे गए फ़ोरम_आईडी को एक सत्र चर या छिपे हुए फॉर्म फ़ील्ड आदि में संग्रहीत करें...

यहाँ सरलीकृत स्कीमा है:

drop table if exists forums;
create table forums
(
forum_id smallint unsigned not null auto_increment primary key,
title varchar(255) unique not null,
next_thread_id int unsigned not null default 0 -- count of threads in each forum
)engine=innodb;


drop table if exists threads;
create table threads
(
forum_id smallint unsigned not null,
thread_id int unsigned not null default 0,
reply_count int unsigned not null default 0,
hash char(32) not null,
created_date datetime not null,
primary key (forum_id, thread_id, reply_count) -- composite clustered index
)engine=innodb;

delimiter #

create trigger threads_before_ins_trig before insert on threads
for each row
begin
declare v_id int unsigned default 0;

  select next_thread_id + 1 into v_id from forums where forum_id = new.forum_id;
  set new.thread_id = v_id;
  update forums set next_thread_id = v_id where forum_id = new.forum_id;
end#

delimiter ;

आपने देखा होगा कि मैंने प्राथमिक कुंजी के हिस्से के रूप में उत्तर_काउंट शामिल किया है जो थोड़ा अजीब है (forum_id, thread_id) समग्र अपने आप में अद्वितीय है। यह सिर्फ एक इंडेक्स ऑप्टिमाइज़ेशन है जो कुछ I/O को बचाता है जब उत्तर_काउंट का उपयोग करने वाले प्रश्नों को निष्पादित किया जाता है। कृपया इस बारे में अधिक जानकारी के लिए ऊपर दिए गए 2 लिंक देखें।

उदाहरण क्वेरी

मैं अभी भी अपने उदाहरण तालिकाओं में डेटा लोड कर रहा हूं और अब तक मेरे पास लगभग लोड किया गया है। 500 मिलियन पंक्तियाँ (आपके सिस्टम से आधी)। जब लोड प्रक्रिया पूरी हो जाती है तो मुझे लगभग:

. होने की उम्मीद करनी चाहिए

250 forums * 5 million threads = 1250 000 000 (1.2 billion rows)

मैंने जानबूझकर कुछ फ़ोरम बनाए हैं जिनमें 5 मिलियन से अधिक थ्रेड हैं, उदाहरण के लिए, फ़ोरम 65 में 15 मिलियन थ्रेड हैं:

forum_id    title   next_thread_id
========    =====   ==============
65        forum 65      15000000 -- ooh a big one

क्वेरी रनटाइम

select sum(next_thread_id) from forums;

sum(next_thread_id)
===================
539,155,433 (500 million threads so far and still growing...)

innodb के अंतर्गत नेक्स्ट_थ्रेड_आईड्स को कुल थ्रेड काउंट देने के लिए सामान्य से बहुत तेज़ है:

select count(*) from threads;

फ़ोरम 65 में कितने सूत्र हैं:

select next_thread_id from forums where forum_id = 65

next_thread_id
==============
15,000,000 (15 million)

फिर से यह सामान्य से तेज़ है:

select count(*) from threads where forum_id = 65

ठीक है अब हम जानते हैं कि हमारे पास अब तक लगभग 500 मिलियन थ्रेड हैं और फ़ोरम 65 में 15 मिलियन थ्रेड हैं - आइए देखें कि स्कीमा कैसा प्रदर्शन करता है :)

select forum_id, thread_id from threads where forum_id = 65 and reply_count > 64 order by thread_id desc limit 32;

runtime = 0.022 secs

select forum_id, thread_id from threads where forum_id = 65 and reply_count > 1 order by thread_id desc limit 10000, 100;

runtime = 0.027 secs

मेरे लिए बहुत अच्छा लग रहा है - तो यह एक क्वेरी के साथ 500+ मिलियन पंक्तियों (और बढ़ती) वाली एक तालिका है जो 0.02 सेकेंड में 15 मिलियन पंक्तियों को कवर करती है (जबकि लोड के तहत!)

आगे अनुकूलन

इनमें शामिल होंगे:

श्रेणी के अनुसार विभाजन
शार्डिंग
उस पर पैसा और हार्डवेयर फेंकना

आदि...

आशा है कि आपको यह उत्तर मददगार लगा होगा :)