Impala out of memory in performance cluster

This is for Hadoop eco system like HDFS, Map reduce, Hive, Hbase, Pig, sqoop,sqoop2, Avro, solr, hcatalog, impala, Oozie, Zoo Keeper and Hadoop distribution like Cloudera, Hortonwork etc.
forum_admin
Site Admin
Posts: 181
Joined: Wed Jul 16, 2014 9:22 pm
Contact:

Impala out of memory in performance cluster

Postby forum_admin » Mon Dec 18, 2017 1:52 am

I am the only user to Impala at performance cluster. After running a few queries, I run into the following error:
[Cloudera][ImpalaODBC] (110) Error while executing a query in Impala: [HY000] : Memory limit exceeded The memory limit is set too low initialize the spilling operator. The minimum required memory to spill this operator is 272.00 MB.

The following link says that increasing memory across all nodes to the same high level will solve the problem:
http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/impala-memory-limit-exceed/m-p/50220
we increase the memory allocated to Impala to 2GB across all nodes.


forum_admin
Site Admin
Posts: 181
Joined: Wed Jul 16, 2014 9:22 pm
Contact:

Re: Impala out of memory in performance cluster

Postby forum_admin » Mon Dec 18, 2017 1:55 am

just increase mem_limit parameter if you have enough memory. You can also calculate the memory limit based on below formula.

Cluster Total Memory Requirement = Size of the smaller table * selectivity factor from the predicate *projection factor * compression ratio
= 100TB * 10% * 5/200 * 3
= 0.75TB
= 750GB

selectivity factor from the predicate=10%(number of rows user will select) it may be varies from 1 to 70% in our case
projection factor =5/200 (5 column out of 200 total column) it may 100 column out of 200 column
compression ratio=3 (snappy compression ratio) it is correct in our case

forum_admin
Site Admin
Posts: 181
Joined: Wed Jul 16, 2014 9:22 pm
Contact:

Re: Impala out of memory in performance cluster

Postby forum_admin » Mon Dec 18, 2017 7:58 pm

Memory - 128 GB or more recommended, ideally 256 GB or more. If the intermediate results during query processing on a particular node exceed the amount of memory available to Impala on that node, the query is cancelled. Note that because the work is parallelized, and intermediate results for aggregate queries are typically smaller than the original data, Impala can query and join tables that are much larger than the memory available on an individual node.
Storage - DataNodes with 12 or more disks each. I/O speeds are often the limiting factor for disk performance with Impala. Ensure that you have sufficient disk space to store the data Impala will be querying.

forum_admin
Site Admin
Posts: 181
Joined: Wed Jul 16, 2014 9:22 pm
Contact:

Re: Impala out of memory in performance cluster

Postby forum_admin » Mon Dec 18, 2017 7:59 pm

most of the time "Memory Limit Exceeded" errors can be due to table/column stats not being collected, leading to the wrong join distribution. Once stats are gathered, you may find queries take significantly less memory to execute.

Many workloads may run perfectly fine with significantly less than the recommended memory once stats are collected.



Return to “Hadoop and Big Data”

Who is online

Users browsing this forum: No registered users and 0 guests