HAVING क्लॉज में उपनाम का उपयोग करने की अनुमति देने के प्रदर्शन निहितार्थ

केवल उस विशेष क्वेरी पर और नीचे लोड किए गए नमूना डेटा के साथ संकीर्ण रूप से केंद्रित है। यह कुछ अन्य प्रश्नों को संबोधित करता है जैसे कि count(distinct ...) दूसरों द्वारा उल्लेख किया गया।

alias in the HAVING ऐसा प्रतीत होता है कि या तो थोड़ा बेहतर प्रदर्शन कर रहा है या अपने विकल्प से काफी बेहतर प्रदर्शन कर रहा है (क्वेरी के आधार पर)।

यह इस answer के माध्यम से शीघ्रता से बनाई गई लगभग 5 मिलियन पंक्तियों वाली एक पूर्व-मौजूदा तालिका का उपयोग करता है मेरा जिसमें 3 से 5 मिनट लगते हैं।

परिणामी संरचना:

CREATE TABLE `ratings` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `thing` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5046214 DEFAULT CHARSET=utf8;

लेकिन इसके बजाय INNODB का उपयोग करना। श्रेणी आरक्षण सम्मिलन के कारण अपेक्षित INNODB अंतर विसंगति पैदा करता है। सिर्फ कह रहा है, लेकिन कोई फर्क नहीं पड़ता। 4.7 मिलियन पंक्तियाँ।

टिम के अनुमानित स्कीमा के निकट आने के लिए तालिका को संशोधित करें।

rename table ratings to students; -- not exactly instanteous (a COPY)
alter table students add column camId int; -- get it near Tim's schema
-- don't add the `camId` index yet

निम्नलिखित में कुछ समय लगेगा। इसे बार-बार टुकड़ों में चलाएं अन्यथा आपका कनेक्शन टाइमआउट हो सकता है। अपडेट स्टेटमेंट में LIMIT क्लॉज के बिना 5 मिलियन पंक्तियों के कारण टाइमआउट है। ध्यान दें, हम करते हैं एक LIMIT क्लॉज हो।

तो हम इसे आधा मिलियन पंक्ति पुनरावृत्तियों में कर रहे हैं। एक कॉलम को 1 और 20 के बीच यादृच्छिक संख्या में सेट करता है

update students set camId=floor(rand()*20+1) where camId is null limit 500000; -- well that took a while (no surprise)

उपरोक्त को तब तक चलाते रहें जब तक कोई camId न हो जाए शून्य है।

मैंने इसे 10 बार की तरह चलाया (पूरी बात में 7 से 10 मिनट लगते हैं)

select camId,count(*) from students
group by camId order by 1 ;

1   235641
2   236060
3   236249
4   235736
5   236333
6   235540
7   235870
8   236815
9   235950
10  235594
11  236504
12  236483
13  235656
14  236264
15  236050
16  236176
17  236097
18  235239
19  235556
20  234779

select count(*) from students;
-- 4.7 Million rows

एक उपयोगी अनुक्रमणिका बनाएं (पाठ्यक्रम के सम्मिलन के बाद)।

create index `ix_stu_cam` on students(camId); -- takes 45 seconds

ANALYZE TABLE students; -- update the stats: http://dev.mysql.com/doc/refman/5.7/en/analyze-table.html
-- the above is fine, takes 1 second

कैंपस टेबल बनाएं।

create table campus
(   camID int auto_increment primary key,
    camName varchar(100) not null
);
insert campus(camName) values
('one'),('2'),('3'),('4'),('5'),
('6'),('7'),('8'),('9'),('ten'),
('etc'),('etc'),('etc'),('etc'),('etc'),
('etc'),('etc'),('etc'),('etc'),('twenty');
-- ok 20 of them

दो क्वेरी चलाएँ:

SELECT students.camID, campus.camName, COUNT(students.id) as studentCount 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID, campus.camName 
HAVING COUNT(students.id) > 3 
ORDER BY studentCount; 
-- run it many many times, back to back, 5.50 seconds, 20 rows of output

और

SELECT students.camID, campus.camName, COUNT(students.id) as studentCount 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID, campus.camName 
HAVING studentCount > 3 
ORDER BY studentCount; 
-- run it many many times, back to back, 5.50 seconds, 20 rows of output

तो समय समान हैं। हर एक दर्जन बार दौड़ें।

EXPLAIN आउटपुट दोनों के लिए समान है

+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+
| id | select_type | table    | type | possible_keys | key        | key_len | ref                  | rows   | Extra                           |
+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+
|  1 | SIMPLE      | campus   | ALL  | PRIMARY       | NULL       | NULL    | NULL                 |     20 | Using temporary; Using filesort |
|  1 | SIMPLE      | students | ref  | ix_stu_cam    | ix_stu_cam | 5       | bigtest.campus.camID | 123766 | Using index                     |
+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+

AVG() फ़ंक्शन का उपयोग करके, मुझे having में उपनाम के साथ प्रदर्शन में लगभग 12% की वृद्धि मिल रही है (समान EXPLAIN के साथ आउटपुट) निम्नलिखित दो प्रश्नों से।

SELECT students.camID, campus.camName, avg(students.id) as studentAvg 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID, campus.camName 
HAVING avg(students.id) > 2200000 
ORDER BY students.camID; 
-- avg time 7.5

explain 

SELECT students.camID, campus.camName, avg(students.id) as studentAvg 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID, campus.camName 
HAVING studentAvg > 2200000
ORDER BY students.camID;
-- avg time 6.5

और अंत में, DISTINCT. :

SELECT students.camID, count(distinct students.id) as studentDistinct 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID 
HAVING count(distinct students.id) > 1000000 
ORDER BY students.camID; -- 10.6   10.84   12.1   11.49   10.1   9.97   10.27   11.53   9.84 9.98
-- 9.9

 SELECT students.camID, count(distinct students.id) as studentDistinct 
 FROM students 
 JOIN campus 
    ON campus.camID = students.camID 
 GROUP BY students.camID 
 HAVING studentDistinct > 1000000 
 ORDER BY students.camID; -- 6.81    6.55   6.75   6.31   7.11 6.36   6.55
-- 6.45

में उपनाम लगातार 35% तेज runs चलता है उसी के साथ EXPLAIN आउटपुट नीचे देखा गया। तो एक ही प्रदर्शन में परिणाम के लिए एक ही व्याख्या आउटपुट को दो बार दिखाया गया है, लेकिन एक सामान्य सुराग के रूप में।

+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+
| id | select_type | table    | type  | possible_keys | key        | key_len | ref                  | rows   | Extra                                        |
+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | campus   | index | PRIMARY       | PRIMARY    | 4       | NULL                 |     20 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | students | ref   | ix_stu_cam    | ix_stu_cam | 5       | bigtest.campus.camID | 123766 | Using index                                  |
+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+

ऑप्टिमाइज़र इस समय होने वाले उपनाम के पक्ष में प्रतीत होता है, विशेष रूप से DISTINCT. के लिए।