MySQL EXPLAIN Notes
mysql> explain select * from t1;
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 20 | NULL |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
EXPLAIN
支持 SELECT, DELETE, INSERT, REPLACE, UPDATEid
查询编号,有几次 SELECT 操作就有几个编号select_type
查询类型SIMPLE
简单查询,查询中不包含子查询和 UNIONPRIMARY
复杂查询最外层的 SELECTUNION
在 UNION 中的第二个及随后的 SELECTUNION RESULT
从 UNION 临时表进行的 SELECTSUBQUERY
子查询中的第一个 SELECTDERIVED
包含在 FROM 子句中的子查询UNCACHEABLE SUBQUERY
不能被缓存的子查询,每次都要计算,是非常耗时的操作UNCACHEABLE UNION
UNION 中不能被缓存的查询
table
: 当前查询访问的表名,如果有子查询或 UNION 会有以下几种格式<unionM,N>
<derivedN>
<subqueryN>
type
查找访问类型,说明 MySQL 使用哪个索引在该表中找到对应的记录。按最优 - 最差依次是:system
const
表中最多有一个匹配行,速度最快eq_ref
primary key 或 unique key 索引的所有部分被连接使用,最多只返回一条符合条件的记录ref
不使用唯一索引,而是用普通索引,可能会找到多个符合条件的记录fulltext
使用全文索引的时候才会出现ref_or_null
和 ref 类似,但 MySQL 会额外一个查询来看哪些行包含了 NULLindex_merge
使用了索引合并优化unique_subquery
比 eq_ref 复杂的地方是使用了 IN 子查询index_subquery
类似 unique_subquery 但在子查询里使用的是非唯一索引range
指定范围的扫描index
和 ALL 类似,不同的是只扫描索引ALL
全表扫描
possible_keys
可能用到了哪些索引来查找key
实际扫描使用的索引key_len
实际使用的索引长度ref
查找时所用到的列或常量rows
预估要读取的行数Extra
额外补充信息- Distinct
- Using index 只使用了索引信息,没有访问行记录
- Using where 读取行记录后使用 where 判断检查
- Using temporary MySQL 创建了临时表来处理查询,需要索引优化
- Using filesort 读取结果后进行了排序操作,需要优化
Spark Notes
Some Spark articles are worth deep reading:
spark-notes
- leave 1 core per node for Hadoop/Yarn/OS deamons
- leave 1G + 1 executor for Yarn ApplicationMaster
- 3-5 cores per executor for good HDFS throughput
Full memory requested to yarn per executor = spark.executor.memory + spark.yarn.executor.memoryOverhead
spark.yarn.executor.memoryOverhead = Max(384MB, 7% of spark.executor.memory)
So, if we request 15GB per executor, actually we got 15GB + 7% * 15GB = ~16G
4 nodes
8 cores per node
50GB per node
1. 5 cores per executor: --executor-cores = 5
2. num cores available per node: 8-1 = 7
3. total available cores in cluster: 7 * 4 = 28
4. available executors: (total cores/num-cores-per-executor), 28/5 = 5
5. leave one executor for Yarn ApplicationMaster: --num-executors = 5-1 = 4
6. number of executors per node: 4/4 = 1
7. memory per executor: 50GB/1 = 50GB
8. cut heap overhead: 50GB - 7%*50GB = 46GB, --executor-memory=46GB
4 executors, 46GB and 5 cores each
1. 3 cores per executor: --executor-cores = 3
2. num cores available per node: 8-1 = 7
3. total available cores in cluster: 7 * 4 = 28
4. available executors: (total cores/num-cores-per-executor), 28/3 = 9
5. leave one executor for Yarn ApplicationMaster: --num-executors = 9-1 = 8
6. number of executors per node: 8/4 = 2
7. memory per executor: 50GB/2 = 25GB
8. cut heap overhead: 25GB * (1-7%) = 23GB, --executor-memory=23GB
8 executors, 23GB and 3 cores each
Spark + Cassandra, All You Need to Know: Tips and Optimizations
- Spark on HDFS has low cost, used in most cases
- Spark with Cassandra in same cluster, will have best performance in throughput and low latency
- Deploy Spark with an Apache Cassandra cluster
- Spark Cassandra Connector
- Cassandra Optimizations for Apache Spark
Spark Optimizations
- Narrow transformations than Wide transformations
- minimize data shuffles
- filter data as early as possible
- set the right number of partitions, 4x of partitions to the number of cores
- avoid data skew
- broadcast for small table joins
- repartition before expensive or multiple joins
- repartition before writing to storage
- be remember that repartition is an expensive operation
- set right number of executors, cores and memory
- get rid of the the Java Serialization, use Kryo Serialization
- Minimize data shuffles and maximize data locality
- Use Data Frames or Data Sets high level APIs to take advantages of the Spark optimizations
- Apache Spark Internals: Tips and Optimizations
~/.forward
echo '[email protected]' > ~/.forward
This will make smtpd
forwards email to the special address. On AWS EC2, SES can be used to forward email to your Gmail.
Patch Notes
- Create patch file:
diff -u file1 file2 > name.patch
, orgit diff > name.patch
- Apply path file:
patch [-u] < name.patch
- Backup before apply patch:
patch -b < name.patch
- Validate patch without apply:
patch --dry-run < name.patch
- Reverse applied path:
patch -R < name.patch
8:24
R.I.P. KOBE
audit.sh
Use PROMPT_COMMAND for bash, and precmd for zsh.
mkdir -p /var/log/.audit
touch /var/log/.audit/audit.log
chown nobody:nobody /var/log/.audit/audit.log
chmod 002 /var/log/.audit/audit.log
chattr +a /var/log/.audit/audit.log
Save to /etc/profile.d/audit.sh
:
HISTSIZE=500000
HISTTIMEFORMAT=" "
export HISTTIMEFORMAT
export HISTORY_FILE=/var/log/.audit/audit.log
export PROMPT_COMMAND='{ curr_hist=`history 1|awk "{print \\$1}"`;last_command=`history 1| awk "{\\$1=\"\" ;print}"`;user=`id -un`;user_info=(`who -u am i`);real_user=${user_info[0]};login_date=${user_info[2]};login_time=${user_info[3]};curr_path=`pwd`;login_ip=`echo $SSH_CONNECTION | awk "{print \\$1}"`;if [ ${login_ip}x == x ];then login_ip=- ; fi ;if [ ${curr_hist}x != ${last_hist}x ];then echo -E `date "+%Y-%m-%d %H:%M:%S"` $user\($real_user\) $login_ip [$login_date $login_time] [$curr_path] $last_command ;last_hist=$curr_hist;fi; } >> $HISTORY_FILE'
echo "local6.* /var/log/commands.log" > /etc/rsyslog.d/commands.conf
systemctl restart rsyslog.service
precmd() { eval 'RETRN_VAL=$?;logger -p local6.debug "$(whoami) [$$]: $(history | tail -n1 | sed "s/^[ ]*[0-9]\+[ ]*//" ) [$RETRN_VAL]"' }
GitHub Actions Canceled Unexpectedly
By default, GitHub will cancel all in-progress jobs if if any matrix
job fails. Set fail-fast: false
to fix this.
How to Activate Noise Cancellation with One AirPod
Settings
- Accessibility
- AirPods
, toggle on Noise Cancellation with One AirPod
.
AirPods Pro 开启单只降噪:设置 - 辅助功能 - AirPods,打开 一只 AirPod 入耳时使用降噪
。
[转]服务端高并发分布式架构演进之路
原文 服务端高并发分布式架构演进之路,本文以淘宝作为例子,介绍从一百个并发到千万级并发情况下服务端的架构的演进过程,同时列举出每个演进阶段会遇到的相关技术,让大家对架构的演进有一个整体的认知,文章最后汇总了一些架构设计的原则。
架构设计的原则:
- N+1设计。系统中的每个组件都应做到没有单点故障;
- 回滚设计。确保系统可以向前兼容,在系统升级时应能有办法回滚版本;
- 禁用设计。应该提供控制具体功能是否可用的配置,在系统出现故障时能够快速下线功能;
- 监控设计。在设计阶段就要考虑监控的手段;
- 多活数据中心设计。若系统需要极高的高可用,应考虑在多地实施数据中心进行多活,至少在一个机房断电的情况下系统依然可用;
- 采用成熟的技术。刚开发的或开源的技术往往存在很多隐藏的bug,出了问题没有商业支持可能会是一个灾难;
- 资源隔离设计。应避免单一业务占用全部资源;
- 架构应能水平扩展。系统只有做到能水平扩展,才能有效避免瓶颈问题;
- 非核心则购买。非核心功能若需要占用大量的研发资源才能解决,则考虑购买成熟的产品;
- 使用商用硬件。商用硬件能有效降低硬件故障的机率;
- 快速迭代。系统应该快速开发小功能模块,尽快上线进行验证,早日发现问题大大降低系统交付的风险;
- 无状态设计。服务接口应该做成无状态的,当前接口的访问不依赖于接口上次访问的状态。