If you're administering HBase, then chances are you've had to think about to manage splits. There's two ways of doing it:
- Automated splits. This is the default strategy. Regions will be split once they've grown past a certain size. The size is configurable and defaults to 10Gb.
- Manual splits. You manually split regions as you see fit.
Luckily, there's a solution that combines the predictability of managing splits manually, with the advantages of automation so you don't have to worry about it. And it's dead simple. I created a script that does essentially the same thing as an automated-strategy, but can be scheduled to run as a cron job.
You can check out the source here: https://github.com/matvey14/hbase-utils
The java code is just a few lines: https://github.com/matvey14/hbase-utils/blob/master/src/matvey14/hbase/tools/HBaseRegionSplitter.java
Sample Usage: ./run_region_splitter.sh -s 10 -r
This will go through all of your HBase regions, and split any region that is bigger than 10Gb. The "-r" argument tells it to actually do splits. Without "-r", the script defaults to "dry-run" mode, so it'll go through each region and show you what will happen, but won't actually do any splitting.