Power of g in Vim

:[range]g[!]/pattern/cmd

! means do not match pattern, cmd list:

  • d: delete
  • m: move
  • t: copy, or co
  • s: replace

for more info:

Perl Tutorial for Beginners

Perl Tutorial for Beginners, what You Will Learn:

  • Where is Perl used?
  • Download & Install Perl - Windows, Mac & Linux
  • Perl Variable
  • Perl Array
  • Perl Hashes
  • Perl Conditional Statements - If, If Else, Else if, Unless, Nested if
  • Perl Loops - Control Structures
  • Perl Operator
  • Perl Special Variables
  • Perl Regular Expression
  • Perl File I/O
  • Perl Subroutine
  • Perl Format- Getting perfect Output
  • Perl Coding Standards
  • Perl Error Handling
  • Perl Socket programming
  • Perl Modules and Packages
  • Object Oriented Programming in Perl
  • PERL V/s Shell Scripting

动态添加/删除 Hadoop DataNode

添加节点

  1. NameNode 添加节点 etc/hadoop/slaves
  2. 同步 etc/hadoop 配置
  3. 在新节点 ./sbin/hadoop-daemon.sh start datanode

删除节点

  1. etc/hadoop/excludes 写入要删掉的节点地址
  2. 修改 etc/hadoop/ hdfs-site.xml:
  <property>
    <name>dfs.hosts.exclude</name>
    <value>/home/web/hadoop/etc/hadoop/excludes</value>
  </property>
  1. 修改 etc/hadoop/mapred-site.xml, 这个是下线 nodemanager
  <property>
    <name>mapred.hosts.exclude</name>
    <value>/home/web/hadoop/etc/hadoop/excludes</value>
    <final>true</final>
  </property>
  1. 修改 etc/hadoop/slaves,去掉要删除的节点
  2. 同步 etc/hadoop/excludesetc/hadoop/slaves 到所有 NameNode
  3. 在 NameNode 执行 ./bin/hadoop dfsadmin -refreshNodes
  4. ./bin/hadoop dfsadmin -report 查看要删除的节点状态变化 Normal -> Decommission in progress -> Decommissioned
  5. 在要删除的节点 ./sbin/hadoop-daemon.sh stop datanode,等待 Admin State 变更为 Dead

Using special SSH key for Git

In ~/.ssh/config:

host github.com
    HostName github.com
    IdentityFile ~/.ssh/id_rsa_github
    User git

don’t forget chmod 600 ~/.ssh/config

Or, use GIT_SSH_COMMAND environment variable:

export GIT_SSH_COMMAND="ssh -i ~/.ssh/id_rsa_example -F /dev/null"

Regex Unicode Scripts

  1. \p{Han} 匹配中文、日语文字,支持简繁体。
  2. \p{Common} 匹配符号
  3. \p{Latin} 匹配拉丁语系
  4. 需要 grep perl 支持,即 grep -P "\p{Han}",或者 rg/ag.
echo '中文/繁體/片仮名/かたかな/カタカナ/katakana' | rg "\p{Han}"   > 中文 繁體 片仮名
echo '中文@mail.com' | rg "\p{Common}"                                > @ .
echo '中文@mail.com' | rg "\p{Latin}"                                 > mail com

Unicode Scripts for more.

Octotree for Safari

brew install [email protected]
export PATH="/usr/local/opt/[email protected]/bin:$PATH"
# make sure node and npm is v10, cause octotree used gulp 3, which is not working with node 12.

git clone https://github.com/ovity/octotree.git ~/src/octotree
cd ~/src/octotree
git checkout master

npm i
npm install [email protected]
npm run dist
# extension locate in ~/src/octotree/tmp/safari/octotree.safariextension/

cd ~/Library/Safari/Extensions
mv ~/src/octotree/tmp/safari/octotree.safariextension .
  1. Enable Developer menu in Safari
  2. Developer - Show Extension Builder
  3. Add octotree.safariextension and Run

MySQL Prefix Index

CREATE TABLE `t1` (
  `bundle` varchar(300) DEFAULT '' COMMENT 'pkg name',
  `domain` varchar(200) DEFAULT '',
  UNIQUE KEY `idx_bundle_domain` (`bundle`(100),`domain`(100))
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=utf8mb4;

关键部分 bundle(100) 来解决组合索引可能会出现的 Specified key was too long; max key length is 767 bytes 错误。

Deployment with git

#!/bin/sh

set -uex

PATH=$PATH:$HOME/bin
export PATH

DIR=/home/serv/project
cd ${DIR}

REV1=$(git rev-parse --verify HEAD)
git pull origin master
REV2=$(git rev-parse --verify HEAD)
test ${REV1} = ${REV2} && echo "Already updated" && exit

make
test $? -ne 0 && echo "make error" && exit

kill -HUP $(cat logs/run.pid)

主要是通过 git rev-parse --verify HEAD 来获取当前 rev hash,前后对比是否一致,以此来决定是否继续。

logrotate

logrotate - rotates, compresses, and mails system logs

# 0 0 * * * /usr/sbin/logrotate --state=/home/serv/logrotate.state /home/serv/logrotate.log.conf
/home/serv/logs/dev.log
/home/serv/logs/access.log {
    rotate 10
    daily
    compress
    create
    copytruncate
    missingok
    dateext
    dateformat -%Y-%m-%d
    dateyesterday

    sharedscripts
    postrotate
        kill -USR1 `cat /var/run/nginx.pid`
    endscript
}
  1. 要么保存到 /etc 配置,由系统调度。也可以自己通过 crontab 调度控制,这种情况要注意加 --state 来保存状态
  2. 像 nginx 可以通过 kill -USR1 来重新打开日志文件,如果服务不支持可以用 copytruncate,先拷贝再清空

Druid Query in JSON

Druid 可以在 Superset SQL 查询,除此之外可以通过 HTTP+JSON 查询:

curl -X POST '<host:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d @query.json
{
  "queryType": "timeseries",
  "dataSource": "cpm_log",
  "granularity": "hour",
  "aggregations": [
    {
      "type": "longSum",
      "name": "requests",
      "fieldName": "req_count_raw"
    },
    {
      "type": "longSum",
      "name": "impressions",
      "fieldName": "win_count"
    },
    {
      "type": "floatSum",
      "name": "revenues",
      "fieldName": "win_price"
    }
  ],
  "postAggregations": [
    {
      "type":"arithmetic",
      "name": "ecpm",
      "fn": "/",
      "fields": [
        {
          "type": "fieldAccess",
          "name": "postAgg_rev",
          "fieldName": "revenues"
        },
        {
          "type": "fieldAccess",
          "name": "postAgg_imps",
          "fieldName": "impressions"
        }
      ]
    }
  ],
  "filter": {
    "type": "and",
    "fields": [
      {
        "type": "selector",
        "dimension": "device_os",
        "value": "android"
      },
      {
        "type": "in",
        "dimension": "req_ad_type",
        "values": ["banner"]
      }
    ]
  },
  "context": {
    "grandTotal": true
  },
  "intervals": [
    "2019-04-09T00:00:00+08:00/2019-04-09T23:00:00+08:00"
  ]
}
  1. queryType 有 timeseries, topN, groupBy, search, timeBoundary
  2. 尽量少用 groupBy 查询,效率不高
  3. topN 查询是通过 metric 来排序
  4. context 可以指定 queryId,这样可以通过 DELETE /druid/v2/{queryId} 取消查询
  5. 去重: {"type": "cardinality", "name": "distinct_pid", "fields": ["ad_pid"]}

RTFM, godruid