We recently noticed some discrepancies when indexing catalog_category_product on Mariadb when compared to Mysql 5.7. The indexer would take around 13 minutes on MariaDb, compared to around 3 seconds on MySql. To add more confusion to the mix, MariaDb was running on a powerful staging server, and MySQL running on a lowly MacBook Pro development machine.
After much head-scratching and career path questioning, the issue seemed to be related to the following statement in Magento\Catalog\Model\Indexer\Category\Product\Action\Full.php
$this->connection->query( $this->connection->insertFromSelect( $resultSelect, $this->tableMaintainer->getMainTmpTable((int)$store->getId()), $columns, AdapterInterface::INSERT_ON_DUPLICATE ) );
This is executed in the reindexCategoriesBySelect method, which creates a temporary table to work with when regenerating the index for a particular store. It turns out that MariaDb’s temporary table usage woefully bad when large amounts of data are being inserted. This appears to be related when aria_used_for_temp_tables is set to ON, a value which can only be changed by recompiling MariaDb. See here and here.
The fix, without switching database engines, or recompiling MariaDb, is to adjust Magento’s batchRowsCount to a lower number so that the database isn’t dealing with as many temporary table inserts at a time. Magento provide config settings for this value for all of their indexers which use temporary tables, so adjust this value for whichever indexer is giving slow performance. For our case, changing the default value from 100000 (the default) to 500 brought the indexer time down from 13 minutes to 6 seconds. The following, added to di.xml was the panacea for our case
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="urn:magento:framework:ObjectManager/etc/config.xsd"> <type name="Magento\Catalog\Model\Indexer\Category\Product\Action\Full"> <arguments> <argument name="batchRowsCount" xsi:type="number">500</argument> </arguments> </type> </config>
Magento’s documentation on use of this value can be found here, which states
“You can experiment to determine the ideal batch size. In general, halving the batch size can decrease the indexer execution time”.
Yup, experiment indeed with this value, YMMV! Happy indexing!