MySQL EXPLAIN Notes

mysql> explain select * from t1;
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
|  1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL |   20 | NULL  |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
  1. EXPLAIN 支持 SELECT, DELETE, INSERT, REPLACE, UPDATE
  2. id 查询编号,有几次 SELECT 操作就有几个编号
  3. select_type 查询类型
    • SIMPLE 简单查询,查询中不包含子查询和 UNION
    • PRIMARY 复杂查询最外层的 SELECT
    • UNION 在 UNION 中的第二个及随后的 SELECT
    • UNION RESULT 从 UNION 临时表进行的 SELECT
    • SUBQUERY 子查询中的第一个 SELECT
    • DERIVED 包含在 FROM 子句中的子查询
    • UNCACHEABLE SUBQUERY 不能被缓存的子查询,每次都要计算,是非常耗时的操作
    • UNCACHEABLE UNION UNION 中不能被缓存的查询
  4. table: 当前查询访问的表名,如果有子查询或 UNION 会有以下几种格式
    • <unionM,N>
    • <derivedN>
    • <subqueryN>
  5. type 查找访问类型,说明 MySQL 使用哪个索引在该表中找到对应的记录。按最优 - 最差依次是:
    1. system
    2. const 表中最多有一个匹配行,速度最快
    3. eq_ref primary key 或 unique key 索引的所有部分被连接使用,最多只返回一条符合条件的记录
    4. ref 不使用唯一索引,而是用普通索引,可能会找到多个符合条件的记录
    5. fulltext 使用全文索引的时候才会出现
    6. ref_or_null 和 ref 类似,但 MySQL 会额外一个查询来看哪些行包含了 NULL
    7. index_merge 使用了索引合并优化
    8. unique_subquery 比 eq_ref 复杂的地方是使用了 IN 子查询
    9. index_subquery 类似 unique_subquery 但在子查询里使用的是非唯一索引
    10. range 指定范围的扫描
    11. index 和 ALL 类似,不同的是只扫描索引
    12. ALL 全表扫描
  6. possible_keys 可能用到了哪些索引来查找
  7. key 实际扫描使用的索引
  8. key_len 实际使用的索引长度
  9. ref 查找时所用到的列或常量
  10. rows 预估要读取的行数
  11. Extra 额外补充信息
    • Distinct
    • Using index 只使用了索引信息,没有访问行记录
    • Using where 读取行记录后使用 where 判断检查
    • Using temporary MySQL 创建了临时表来处理查询,需要索引优化
    • Using filesort 读取结果后进行了排序操作,需要优化

Spark Notes

Some Spark articles are worth deep reading:

spark-notes

  1. leave 1 core per node for Hadoop/Yarn/OS deamons
  2. leave 1G + 1 executor for Yarn ApplicationMaster
  3. 3-5 cores per executor for good HDFS throughput
Full memory requested to yarn per executor = spark.executor.memory + spark.yarn.executor.memoryOverhead

spark.yarn.executor.memoryOverhead = Max(384MB, 7% of spark.executor.memory)

So, if we request 15GB per executor, actually we got 15GB + 7% * 15GB = ~16G

MemoryOverhead

4 nodes
8 cores per node
50GB per node


1. 5 cores per executor: --executor-cores = 5
2. num cores available per node: 8-1 = 7
3. total available cores in cluster: 7 * 4 = 28
4. available executors: (total cores/num-cores-per-executor), 28/5 = 5
5. leave one executor for Yarn ApplicationMaster: --num-executors = 5-1 = 4
6. number of executors per node: 4/4 = 1
7. memory per executor: 50GB/1 = 50GB
8. cut heap overhead: 50GB - 7%*50GB = 46GB, --executor-memory=46GB

4 executors, 46GB and 5 cores each

1. 3 cores per executor: --executor-cores = 3
2. num cores available per node: 8-1 = 7
3. total available cores in cluster: 7 * 4 = 28
4. available executors: (total cores/num-cores-per-executor), 28/3 = 9
5. leave one executor for Yarn ApplicationMaster: --num-executors = 9-1 = 8
6. number of executors per node: 8/4 = 2
7. memory per executor: 50GB/2 = 25GB
8. cut heap overhead: 25GB * (1-7%) = 23GB, --executor-memory=23GB

8 executors, 23GB and 3 cores each

Spark + Cassandra, All You Need to Know: Tips and Optimizations

  1. Spark on HDFS has low cost, used in most cases
  2. Spark with Cassandra in same cluster, will have best performance in throughput and low latency
  3. Deploy Spark with an Apache Cassandra cluster
  4. Spark Cassandra Connector
  5. Cassandra Optimizations for Apache Spark

Spark Optimizations

  1. Narrow transformations than Wide transformations
  2. minimize data shuffles
  3. filter data as early as possible
  4. set the right number of partitions, 4x of partitions to the number of cores
  5. avoid data skew
  6. broadcast for small table joins
  7. repartition before expensive or multiple joins
  8. repartition before writing to storage
  9. be remember that repartition is an expensive operation
  10. set right number of executors, cores and memory
  11. get rid of the the Java Serialization, use Kryo Serialization
  12. Minimize data shuffles and maximize data locality
  13. Use Data Frames or Data Sets high level APIs to take advantages of the Spark optimizations
  14. Apache Spark Internals: Tips and Optimizations

~/.forward

echo '[email protected]' > ~/.forward

This will make smtpd forwards email to the special address. On AWS EC2, SES can be used to forward email to your Gmail.

AWS Data Transfer Costs

Patch Notes

  1. Create patch file: diff -u file1 file2 > name.patch, or git diff > name.patch
  2. Apply path file: patch [-u] < name.patch
  3. Backup before apply patch: patch -b < name.patch
  4. Validate patch without apply: patch --dry-run < name.patch
  5. Reverse applied path: patch -R < name.patch

8:24

R.I.P. KOBE

8:24

audit.sh

Use PROMPT_COMMAND for bash, and precmd for zsh.

mkdir -p /var/log/.audit
touch /var/log/.audit/audit.log
chown nobody:nobody /var/log/.audit/audit.log
chmod 002 /var/log/.audit/audit.log
chattr +a /var/log/.audit/audit.log

Save to /etc/profile.d/audit.sh

HISTSIZE=500000
HISTTIMEFORMAT=" "
export HISTTIMEFORMAT
export HISTORY_FILE=/var/log/.audit/audit.log
export PROMPT_COMMAND='{ curr_hist=`history 1|awk "{print \\$1}"`;last_command=`history 1| awk "{\\$1=\"\" ;print}"`;user=`id -un`;user_info=(`who -u am i`);real_user=${user_info[0]};login_date=${user_info[2]};login_time=${user_info[3]};curr_path=`pwd`;login_ip=`echo $SSH_CONNECTION | awk "{print \\$1}"`;if [ ${login_ip}x == x ];then login_ip=- ; fi ;if [ ${curr_hist}x != ${last_hist}x ];then echo -E `date "+%Y-%m-%d %H:%M:%S"` $user\($real_user\) $login_ip [$login_date $login_time] [$curr_path] $last_command ;last_hist=$curr_hist;fi; } >> $HISTORY_FILE'
echo "local6.*  /var/log/commands.log" > /etc/rsyslog.d/commands.conf
systemctl restart rsyslog.service
precmd() { eval 'RETRN_VAL=$?;logger -p local6.debug "$(whoami) [$$]: $(history | tail -n1 | sed "s/^[ ]*[0-9]\+[ ]*//" ) [$RETRN_VAL]"' }

GitHub Actions Canceled Unexpectedly

How to Activate Noise Cancellation with One AirPod

Settings - Accessibility - AirPods, toggle on Noise Cancellation with One AirPod.

AirPods Pro 开启单只降噪:设置 - 辅助功能 - AirPods,打开 一只 AirPod 入耳时使用降噪

[转]服务端高并发分布式架构演进之路

原文 服务端高并发分布式架构演进之路,本文以淘宝作为例子,介绍从一百个并发到千万级并发情况下服务端的架构的演进过程,同时列举出每个演进阶段会遇到的相关技术,让大家对架构的演进有一个整体的认知,文章最后汇总了一些架构设计的原则。

  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
  11. 10
  12. 11
  13. 12
  14. 13
  15. 14

架构设计的原则:

  1. N+1设计。系统中的每个组件都应做到没有单点故障;
  2. 回滚设计。确保系统可以向前兼容,在系统升级时应能有办法回滚版本;
  3. 禁用设计。应该提供控制具体功能是否可用的配置,在系统出现故障时能够快速下线功能;
  4. 监控设计。在设计阶段就要考虑监控的手段;
  5. 多活数据中心设计。若系统需要极高的高可用,应考虑在多地实施数据中心进行多活,至少在一个机房断电的情况下系统依然可用;
  6. 采用成熟的技术。刚开发的或开源的技术往往存在很多隐藏的bug,出了问题没有商业支持可能会是一个灾难;
  7. 资源隔离设计。应避免单一业务占用全部资源;
  8. 架构应能水平扩展。系统只有做到能水平扩展,才能有效避免瓶颈问题;
  9. 非核心则购买。非核心功能若需要占用大量的研发资源才能解决,则考虑购买成熟的产品;
  10. 使用商用硬件。商用硬件能有效降低硬件故障的机率;
  11. 快速迭代。系统应该快速开发小功能模块,尽快上线进行验证,早日发现问题大大降低系统交付的风险;
  12. 无状态设计。服务接口应该做成无状态的,当前接口的访问不依赖于接口上次访问的状态。