1. QoS
HBase的请求都有一个请求级别,即优先级(priorityLevel)。在RPC那一层也有它们相应级别的线程池,根据请求的优先级放到相应的线程池中。这两个线程池的线程数量分别由参数hbase.regionserver.handler.count 和hbase.regionserver.metahandler.count配置。
在regionserver中,优先级<=10的被认为是一个普通请求,它会分配到IPC Server handler 队列中去;优先级>10的请求是被认为是优先处理请求,它会被分配到PRI IPC Server handler中去。能够放入优先请求队列的请求有如下两个特征:
- 该请求和调用方法被注解@QosPriority了,并且该注解的priority值大于10。例如在HRegionServer里有这些函数是具备较高优先级别的:openRegion,closeRegion,flushRegion,splitRegion,compactRegion,getProtocolSignature,getRegionInfo,unlockRow等
- 该请求是操作元数据region:即操作的是.META.或者-ROOT-表
它们的值是通过org.apache.hadoop.hbase.regionserver.HRegionServer.QosFunction计算出来的。
2. ZooKeeperWatcher
ZooKeeperWatcher是HBase实现ZooKeeper Watcher的惟一实现。通过它控制着zookeeper里面所有的节点状态:创建,删除,更新,事件回调等等。在HMaster, HRegionServer和Client都只有一个的实例去连接ZooKeeper集群。
3. XxxTracker
在HBase里面有很多的Tracker类,他们分别承担着不同的作用。
- ClusterStatusTracker 对应/hbase/shutdown,在hmaster中用来记录集群状态信息,例如,集群的上线时间。
- DrainingServerTracker 对应/hbase/draining,在hmaster中记录这些regionserver列表不能够再分配新的region。
- MetaNodeTracker 被CatalogTracker调用
- CatalogTracker 监控对.META.表和-ROOT-表的可用性,管理着RootRegionTracker和MetaNodeTracker。CatalogTracker记录着.meta./-root-表所在region的状态。事实上,zookeeper/hbase/root-region-server记录着-root-表的位置,.meta.的信息记录在-root-表里,而那些用户表的信息都放在.meta.表里。
- RegionServerTracker 对应/hbase/rs, 维护着活着的regionserver列表信息,
- RootRegionTracker 对应/hbase/root-region-server,
- ZooKeeperNodeTracker 是一个对应ZooKeeper节点的Tracker,是一个抽象类。
还有一个/hbase/unassigned下面的region还是处于待分配状态。
4. hbase表状态不一致
hbase表状态不一致是通常指hbase .meta.表中的元数据信息与存取在hdfs上的数据信息不一致。造成hbase表状态不一致的原因有很多种。大多数情况下是在region split时出现hbase regionserver突然挂掉,操作失败导致hbase回滚等等原因引发的不一致。可以通过命令hbase hbck查看hbase集群状态是否是完整的,查看哪些数据是不一致的。同时可以通过hbase hbck -repair修复不一致的数据。
5. .META.表不能被split
我们一直有一种感觉,就是在hbase中有root表维护着meta表信息,按道理是可以把meta表split成两个或更多region。但是事实上,这是不行的。在checkSplit里面自然这种情况过滤掉了。可以参考JIRA Disable META splitting in 0.20
public byte[] checkSplit() {
// Can't split META
if (getRegionInfo().isMetaRegion()) {
if (shouldForceSplit()) {
LOG.warn("Cannot split meta regions in HBase 0.20 and above");
}
return null;
}
if (!splitPolicy.shouldSplit()) {
return null;
}
byte[] ret = splitPolicy.getSplitPoint();
if (ret != null) {
try {
checkRow(ret, "calculated split");
} catch (IOException e) {
LOG.error("Ignoring invalid split", e);
return null;
}
}
return ret;
}
6. DrainingServer
drainingServer里的regionserver不再分配新region,你即使把某个region move到该节点上,也会自动随机分配到其它的节点中去。详情可以参考这个JIRA:
Support to drain RS nodes through ZK
/**
* @param state
* @param serverToExclude Server to exclude (we know its bad). Pass null if
* all servers are thought to be assignable.
* @param forceNewPlan If true, then if an existing plan exists, a new plan
* will be generated.
* @return Plan for passed <code>state</code> (If none currently, it creates one or
* if no servers to assign, it returns null).
*/
RegionPlan getRegionPlan(final RegionState state,
final ServerName serverToExclude, final boolean forceNewPlan) {
// Pickup existing plan or make a new one
final String encodedName = state.getRegion().getEncodedName();
final List<ServerName> servers = this.serverManager.getOnlineServersList();
final List<ServerName> drainingServers = this.serverManager.getDrainingServersList(); //draining server 列表
if (serverToExclude != null) servers.remove(serverToExclude);
// Loop through the draining server list and remove them from the server
// list.
if (!drainingServers.isEmpty()) {
for (final ServerName server: drainingServers) { // 从onlineserver列表里面去掉draining server
LOG.debug("Removing draining server: " + server +
" from eligible server pool.");
servers.remove(server);
}
}
// Remove the deadNotExpired servers from the server list.
removeDeadNotExpiredServers(servers);
if (servers.isEmpty()) return null;
RegionPlan randomPlan = null;
boolean newPlan = false;
RegionPlan existingPlan = null;
synchronized (this.regionPlans) {
existingPlan = this.regionPlans.get(encodedName);
if (existingPlan != null && existingPlan.getDestination() != null) {
LOG.debug("Found an existing plan for " +
state.getRegion().getRegionNameAsString() +
" destination server is " + existingPlan.getDestination().toString());
}
if (forceNewPlan
|| existingPlan == null
|| existingPlan.getDestination() == null
|| drainingServers.contains(existingPlan.getDestination())) { //如果计划move 到draining server里面,那么就随机分配一个destination server
newPlan = true;
randomPlan = new RegionPlan(state.getRegion(), null, balancer
.randomAssignment(servers));
this.regionPlans.put(encodedName, randomPlan);
}
}
if (newPlan) {
LOG.debug("No previous transition plan was found (or we are ignoring " +
"an existing plan) for " + state.getRegion().getRegionNameAsString() +
" so generated a random one; " + randomPlan + "; " +
serverManager.countOfRegionServers() +
" (online=" + serverManager.getOnlineServers().size() +
", available=" + servers.size() + ") available servers");
return randomPlan;
}
LOG.debug("Using pre-existing plan for region " +
state.getRegion().getRegionNameAsString() + "; plan=" + existingPlan);
return existingPlan;
}
7. openRegion原理
openRegion就是对HRegion进行初始化工作。下面是真正进行初始化region的代码。
private long initializeRegionInternals(final CancelableProgressable reporter,
MonitoredTask status) throws IOException, UnsupportedEncodingException {
if (coprocessorHost != null) {
status.setStatus("Running coprocessor pre-open hook");
coprocessorHost.preOpen();
}
// Write HRI to a file in case we need to recover .META.
status.setStatus("Writing region info on filesystem");
checkRegioninfoOnFilesystem();
// Remove temporary data left over from old regions
status.setStatus("Cleaning up temporary data from old regions");
cleanupTmpDir();
// Load in all the HStores.
// Get minimum of the maxSeqId across all the store.
//
// Context: During replay we want to ensure that we do not lose any data. So, we
// have to be conservative in how we replay logs. For each store, we calculate
// the maxSeqId up to which the store was flushed. But, since different stores
// could have a different maxSeqId, we choose the
// minimum across all the stores.
// This could potentially result in duplication of data for stores that are ahead
// of others. ColumnTrackers in the ScanQueryMatchers do the de-duplication, so we
// do not have to worry.
// TODO: If there is a store that was never flushed in a long time, we could replay
// a lot of data. Currently, this is not a problem because we flush all the stores at
// the same time. If we move to per-cf flushing, we might want to revisit this and send
// in a vector of maxSeqIds instead of sending in a single number, which has to be the
// min across all the max.
long minSeqId = -1;
long maxSeqId = -1;
// initialized to -1 so that we pick up MemstoreTS from column families
long maxMemstoreTS = -1;
if (this.htableDescriptor != null &&
!htableDescriptor.getFamilies().isEmpty()) {
// initialize the thread pool for opening stores in parallel.
ThreadPoolExecutor storeOpenerThreadPool =
getStoreOpenAndCloseThreadPool(
"StoreOpenerThread-" + this.regionInfo.getRegionNameAsString());
CompletionService<Store> completionService =
new ExecutorCompletionService<Store>(storeOpenerThreadPool);
// initialize each store in parallel
for (final HColumnDescriptor family : htableDescriptor.getFamilies()) {
status.setStatus("Instantiating store for column family " + family);
completionService.submit(new Callable<Store>() {
public Store call() throws IOException {
return instantiateHStore(tableDir, family);
}
});
}
try {
for (int i = 0; i < htableDescriptor.getFamilies().size(); i++) {
Future<Store> future = completionService.take();
Store store = future.get();
this.stores.put(store.getColumnFamilyName().getBytes(), store);
long storeSeqId = store.getMaxSequenceId();
if (minSeqId == -1 || storeSeqId < minSeqId) {
minSeqId = storeSeqId;
}
if (maxSeqId == -1 || storeSeqId > maxSeqId) {
maxSeqId = storeSeqId;
}
long maxStoreMemstoreTS = store.getMaxMemstoreTS();
if (maxStoreMemstoreTS > maxMemstoreTS) {
maxMemstoreTS = maxStoreMemstoreTS;
}
}
} catch (InterruptedException e) {
throw new IOException(e);
} catch (ExecutionException e) {
throw new IOException(e.getCause());
} finally {
storeOpenerThreadPool.shutdownNow();
}
}
mvcc.initialize(maxMemstoreTS + 1);
// Recover any edits if available.
maxSeqId = Math.max(maxSeqId, replayRecoveredEditsIfAny(
this.regiondir, minSeqId, reporter, status));
status.setStatus("Cleaning up detritus from prior splits");
// Get rid of any splits or merges that were lost in-progress. Clean out
// these directories here on open. We may be opening a region that was
// being split but we crashed in the middle of it all.
SplitTransaction.cleanupAnySplitDetritus(this);
FSUtils.deleteDirectory(this.fs, new Path(regiondir, MERGEDIR));
this.writestate.setReadOnly(this.htableDescriptor.isReadOnly());
this.writestate.flushRequested = false;
this.writestate.compacting = 0;
// Initialize split policy
this.splitPolicy = RegionSplitPolicy.create(this, conf);
this.lastFlushTime = EnvironmentEdgeManager.currentTimeMillis();
// Use maximum of log sequenceid or that which was found in stores
// (particularly if no recovered edits, seqid will be -1).
long nextSeqid = maxSeqId + 1;
LOG.info("Onlined " + this.toString() + "; next sequenceid=" + nextSeqid);
// A region can be reopened if failed a split; reset flags
this.closing.set(false);
this.closed.set(false);
if (coprocessorHost != null) {
status.setStatus("Running coprocessor post-open hooks");
coprocessorHost.postOpen();
}
status.markComplete("Region opened successfully");
return nextSeqid;
}
分享到:
相关推荐
Hadoop与HBase自学笔记 1、 安装jdk 2、 安装Cygwin以及相关服务 2.1 安装Cygwin 2.2 配置环境变量 2.3 安装sshd服务 2.4 启动sshd服务 2.5 配置ssh登录 3、安装hadoop 3.1 修改hadoop-env.sh 3.2 修改...
比较详细的HBase学习笔记,精心制作 HBase是一个分布式的、面向列的开源数据库,源于google的一篇论文《bigtable:一个结构化数据的分布式存储系统》。HBase是Google Bigtable的开源实现,它利用Hadoop HDFS作为其...
个人笔记整理(带目录),共8个章节: 一.Hbase快速入门 二.Apache HBase配置 三.Hbase数据存储 四.HBase协处理器与二级索引 五.PHOENIX操作HBASE 六.HBase设计与优化 七.HBase与Spark集成 八.Trafodion操作HBase 共146...
hive编译,jars,HBaseScanner构建.pptx,HBase之RegionServer命令启动流程.pptx,扩展.docx,尚硅谷大数据技术之HBase.xmind,尚硅谷大数据技术之HBase.pdf等等 很全。
本自学笔记,是根据各大机构及HBase官网综合整理,笔记中有详细的知识体系,从基础到提升再到案例,深入浅出介绍hbase.
自己在大数据培训班学习整理的笔记,比较详细,适合新手学习,我感觉还是挺有帮助的,希望可以帮助到你
IT十八掌大数据第三期配套课堂笔记! 1 、HBase的特点 2 、HBase访问接口 3 、HBase存储结构与格式 4 、HBase设计 5 、关键算法和流程 6 、HBase安装 7、HBase的Shell操作 8、HBase客户端
Hbase笔记 —— 利用JavaAPI的方式操作Hbase数据库(往hbase的表中批量插入数据)
hadoop,hbase,zookeeper安装笔记hadoop,hbase,zookeeper安装笔记hadoop,hbase,zookeeper安装笔记
hbase汇总整理,本文亲自整理,没有坑
│ Day15[Hbase 基本使用及存储设计].pdf │ ├─02_视频 │ Day1501_Hbase的介绍及其发展.mp4 │ Day1502_Hbase中的特殊概念.mp4 │ Day1503_Hbase与MYSQL的存储比较.mp4 │ Day1504_Hbase部署环境准备.mp4 │ Day...
hbase-hbck2-1.1.0-SNAPSHOT.jar
个人课上整理的Hbase课程资料笔记,包括HBase安装、原理以及入门实战。
HBASE精炼版
hbase hbck2 jar; 完整打包; 适用于hbase 2.x维护,hbase 1.x不适用; 使用命令:hbase hbck -j hbase-hbck2-1.3.0-SNAPSHOT.jar fixMeta,最后两个参数分别代表 hbck2 jar包路径,维护命令(hbase-hbck2-1.3.0-...
NULL 博文链接:https://alleyz.iteye.com/blog/2249350
Hbase为Hadoop生态的存储引擎,为大数据系统提供了在线存储能力,为海量数据存储提供了很好的支撑。hbase系统架构也非常值得学习和借鉴,值得学习研究。
赠送jar包:hbase-hadoop2-compat-1.2.12.jar; 赠送原API文档:hbase-hadoop2-compat-1.2.12-javadoc.jar; 赠送源代码:hbase-hadoop2-compat-1.2.12-sources.jar; 赠送Maven依赖信息文件:hbase-hadoop2-compat-...