Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
stylenone

...

Get only the columns you need

The more columns you have in the result, the bigger amount of data needs to be read from storage, transmitted over the network from the database and loaded to the memory of the application server. Therefore it is important to specify only for the columns you really need to get and work with. For example, if you need to work with 6 columns from a table having 30 column in total, it means that if you don’t specify to retrieve particular columns and fetch all, it means that about 80% of the data (depending on individual column length) transmitted over network and loaded in the memory will be not used and only eating memory and network resources. Performing such query repeatedly by many users or for many line items makes the situation even worse and may even crash the server. When doing JOINs this risk is even higher, since that means additional columns are added to the result. Therefore it is important to always specify the columns to be retrieved.

Join criteria on all key fields

Joining tables can significantly speed up gathering data needed for calculation, but only if the database can use index to lookup this data. Therefore, it is strongly recommended, to specify all key fields or majority of the key fields (left-to-right) in the joining criteria. Remember that the order of the key fields of index matters, so searching just for firstname in the phonebook indexed on [surname, firstname] causes full table scan that will significantly prolong the operation.

Sorting

If you need to perform sorting on big data, it should be done on an indexed field, otherwise it will take very long time.

Fast processing of stream row

Fetching the rows from a table blocks the database connection. Therefore the processing of a row should be as fast as possible.

Avoid loading big amount of rows to memory

Transactional tables:

Analytical tables:Collecting big amount of rows to memory using collect { it } and processing it later may lead to out of memory exception. This could be a bit counter-intuitive as it goes directly against previous statement about processing rows as fast as possible, no? What could be faster than just loading all the rows in memory, close that stream and process it all later? Right? Wrong. Sometimes. Could work nice if you read just a few rows, small table that will never grow. Works fine on QA where the table have only a few rows then crashes on PROD where the same table have millions of rows.

Also the more you load into memory, the slower it gets (Java Virtual Machine’s Garbage Collector must work harder and harder to find that free memory for you). So what is the solution? Process rows in batches. Load in memory just a certain fixed amount of rows (experiment with different settings to find a suitable number), process them, throw them away (clear the list/map you read them into) and onto next batch.