Wednesday, March 21, 2012

Problem with a query

Hi, I have a problem with SQL Server 2000:

this query, who works on the table "Flussi_Rivendite" which contains more or less 700.000 rows, doesn't work speedly:

SQL = "SELECT SUM(Quantità) AS Quantità " _

& "FROM Flussi_Rivendite WHERE " _

& "DataFlusso BETWEEN 20070101 AND 20071031 AND " _

& "ID_AnagraficaRivendita IN " _

& "(SELECT AnagraficaRivendite.ID_AnagraficaRivendita FROM AnagraficaRivendite WHERE " _

& "AnagraficaRivendite.ID_Agente=" & dTable_Agenti.Rows(x)("ID_Agente") & ")"

The problem is in last part of query:

& "ID_AnagraficaRivendita IN " _

& "(SELECT AnagraficaRivendite.ID_AnagraficaRivendita FROM AnagraficaRivendite WHERE " _

& "AnagraficaRivendite.ID_Agente=" & dTable_Agenti.Rows(x)("ID_Agente") & ")"

ID_Agente is a foreign key of another table ("AnagraficaRivendite" which contains 60.000 rows)

I have seen I can improve "waiting time" of this query if I add "ID_Agente" field at table Flussi_Rivendite, so I can transform initial query in other query simpler:

SQL = "SELECT SUM(Quantità) AS Quantità " _

& "FROM Flussi_Rivendite WHERE " _

& "DataFlusso BETWEEN 20070101 AND 20071031 AND " _

& "ID_Agente=" & dTable_Agenti.Rows(x)("ID_Agente")

I would ask you if it is correct to add column ID_Agente on the table Flussi_Rivendite. Until today I have always avoided to do this, I have always filtered my table by query like: "....IN (SELECT FIELDS FROM TABLE WHERE ecc...)".

But now, these tables have many many records and I have found only this solution to improve the query (like I have just said, adding external key "ID_Agente" directly on table "Flussi_Rivendite". Sorry for my english, I hope someone can suggest something me ;)

Hi Maurodii

I like the table names -- some beautiful language!

Did you try:

"SELECT SUM(Flussi_Rivendite .Quantità) AS Quantità " _& "FROM Flussi_Rivendite, AnagraficaRivendite WHERE " _& " Flussi_Rivendite.DataFlusso BETWEEN 20070101 AND 20071031 AND " _

& " Flussi_Rivendite.ID_AnagraficaRivendita = AnagraficaRivendite. ID_AnagraficaRivendita " _

& "AnagraficaRivendite.ID_Agente=" & dTable_Agenti.Rows(x)("ID_Agente") & ")"

Hope this helps and Good Luck!

Fouwaaz

|||

IN can be a performance killer. Look into using EXISTS instead of IN.

|||

Hello, I have tried this but the performance is the same, unfortunately :(

SQL = "SELECT SUM(Flussi_Rivendite.Quantita) AS Quantita " _

& "FROM Flussi_Rivendite, AnagraficaRivendite WHERE " _

& "Flussi_Rivendite.DataFlusso BETWEEN 20070101 AND 20071031 AND " _

& "Flussi_Rivendite.ID_AnagraficaRivendita = AnagraficaRivendite.ID_AnagraficaRivendita AND " _

& "AnagraficaRivendite.ID_Agente=" & dTable_Agenti.Rows(x)("ID_Agente")

Dear ndinakar, may you post me how you suggest me to do with the clause EXISTS? I have tried it, but I didn't make it :(

I remember you that my target is don't preserve column "ID_Agente" in table Flussi_Rivendite.

Thank you

|||

Try running the query directly in Query Analyzer or Management Studio and see how long its taking. Also I would recommend using >= and <= instead of using BETWEEN.

SELECTSUM(Flussi_Rivendite.Quantita)AS Quantita

FROM Flussi_Rivendite, AnagraficaRivendite

WHERE Flussi_Rivendite.DataFlusso>='20070101'AND Flussi_Rivendite.DataFlusso<='20071031'

AND Flussi_Rivendite.ID_AnagraficaRivendita= AnagraficaRivendite.ID_AnagraficaRivendita

AND AnagraficaRivendite.ID_Agente=<somevalue>

|||

I have changed clause Between like you suggested me but performance doesn't improve. Moreover, I have tried to run both query in Query Analyzer, I post here results:

Query N.1

SELECT SUM(Flussi_Rivendite.Quantità) AS Quantità
FROM Flussi_Rivendite, AnagraficaRivendite WHERE
Flussi_Rivendite.DataFlusso>=20070901 AND Flussi_Rivendite.DataFlusso<=20070931 AND
Flussi_Rivendite.ID_AnagraficaRivendita = AnagraficaRivendite.ID_AnagraficaRivendita AND
AnagraficaRivendite.ID_Agente=1

Flussi_Rivendite: Costs: 67% Anagrafica_Rivendite: Costs: 300%

Query N.2

SELECT SUM(Quantità) AS Quantità
FROM Flussi_Rivendite WHERE
DataFlusso BETWEEN 20070901 AND 20070931 AND
ID_Agente=1

Flussi_Rivendite: Costs: 0% Anagrafica_Rivendite: Costs: 0%

In my opinion when I directly filter ID_Agente in the same table (Query n.2) is best solution for me, also if database's structure will be a little more complicated. But it is too faster then Query n.1

|||

Do you have any indexes on DataFlusso column? or on AnagraficaRivendite.ID_Agente column? Having proper indexes is important for faster data retrieval otherwise SQL Server has to scan your entire table to get to the rows and then perform the computation.

|||

IMPORTANT IMPROVEMENT!!!!

I have built a new index like you suggested me in table Flussi_Rivendite: DataFlusso, ID_AnagragraficaRivendite, ID_Agente.

Now waiting time is pull down until 6/7 seconds!!!! It is good, also if with my alternative procedure waiting time it was 3/4 seconds. Thank you again, next monday I will come back in the office and I'll try it better (now I am working from my home by Remote Desktop and it's not easy...).

Have a good week end!!

ps: If you some indication to built/edit better this new index please tell me ;)

|||

Build this index on AnagraficaRivendite : ID_Agente, ID_AnagraficaRivendita

|||

Motley:

Build this index on AnagraficaRivendite : ID_Agente, ID_AnagraficaRivendita

I've already created.

Please note that if I change range of DataFlusso (for es. 20060101 and 20071031) waiting time increases again, but I think it's normal because so I'm asking for more records. I think I need a newer and faster processor... ;) or not?

|||

I think this is best solution if I don't add a column ID_Agente to table Flussi_Rivendite:

SELECT SUM(dbo.Flussi_Rivendite.Quantità) AS Quantità
FROM dbo.Flussi_Rivendite INNER JOIN
dbo.AnagraficaRivendite ON dbo.Flussi_Rivendite.ID_AnagraficaRivendita = dbo.AnagraficaRivendite.ID_AnagraficaRivendita
WHERE (dbo.Flussi_Rivendite.DataFlusso >= 20070101) AND (dbo.Flussi_Rivendite.DataFlusso <= 20070131) AND
(dbo.AnagraficaRivendite.ID_Agente = 2)

ByeSmile

No comments:

Post a Comment