29/02/24

From 29/02/08, this site has been running for 16 years.

Debounce and Throttle

  1. debounce 延迟生效,将间隔不超过设定时间的多次连续调用变成一次。如果在设定时间内连续两次调用,第一次调用会被取消,如果设定时间内没有再次调用,则生效执行
  2. throttle 节流阀,确保函数被多次连续调用时,在设定时间内最多只执行一次
  3. debounce 和 throttle 都可以用于降低事件处理函数的调用频率,以提高性能
  4. 在连续快速输入时,debounce 会等待最后一次输入后才执行,autocomplete 场景很有用
  5. throttle 会规律、稳定的执行,但是会有一定的延迟

Sequel Ace connected MySQL from SSH

After changed ~/.ssh/config, Ace still can’t connect to MySQL, we need to config Ace to grant access to .ssh files:

image

RAG usage in Deev.ai

devv.ai 是如何构建高效的 RAG 系统的

  1. https://twitter.com/Tisoga/status/1731478506465636749
  2. https://twitter.com/Tisoga/status/1736544319199478175

如何让 LLM 使用外部知识库进行生成?之前的做法是在增加新的知识库后 fine-tuning,缺点是:每次更新知识都要重新 fine-tuning,带来巨大的训练成本。新的方案是 RAG,Retrieval Augmented Generation(检索增强生成),通过 prompt 的方式把新知识给到 LLM。三部分:

  1. LLM,GPT 或者开源的 LLaMA
  2. 固定不变的外部知识集合
  3. 当前场景下需要的外部知识

Notes:

  • 外部知识库的存储,通过 OpenAI embedding 模型把知识数据向量化
  • vector 向量数据库存储,Chroma、Pinecone、pgvector 等
  • 优先做工原则:encoding 的时候做的越多,retrieve 的时候就能够更快更准
  • 对数据做更多的细致处理,比如知识文档 chunk 分块,ranking 优化等
  • 可以结合搜索引擎提高准确度
  • 评估指标
    1. fluency,流畅性,生成的文本是否流畅连贯
    2. perceived utility,实用性,生成的内容是否有用
    3. citation recall,引文召回率,所生成的内容完全得到引文支持的比例
    4. citation precision,引文精度,引文中支持生成内容的比例

emerging LLM


使用 LLM 的三种方式:Prompting, RAG, Fine-Tuning: RAG 用于扩展知识库,微调更多是关于改变结构(行为)而非知识。

Prompting-vs-RAG-Fine-Tuning

cd error with CDAPTH

I’ve set $CDPATH in zsh for quick directories switching. But this bring some issues with make or npm run:

/bin/sh: line 0: cd: src: No such file or directory

cd wants to change some directory but uses $CDPATH first to find. Add . to $CDPATH to fix this:

export CDPATH=.:$HOME/src

gnutls_handshake() failed: Error in the pull function

The error means that Git can’t establish a secure connection to the remote repository. Your version of Git uses the GnuTLS library to set up TLS (encrypted) connections, try building Git against a version of libcurl using OpenSSL.

sudo apt-get update
sudo apt-get install curl build-essential fakeroot dpkg-dev libcurl4-openssl-dev
sudo apt-get build-dep git
mkdir ~/git-openssl
cd ~/git-openssl
apt-get source git
cd git-2.17.1/

vim debain/control
# :%s/libcurl4-gnutls-dev/libcurl4-openssl-dev/g

vim debian/rules
# comment TEST =test to ignore test in building

sudo dpkg-buildpackage -rfakeroot -b -uc -us
sudo dpkg -i ../git_2.17.1-1ubuntu0.4_amd64.deb

Notes on 前端密码加密

https://blog.huli.tw/2023/01/10/security-of-encrypt-or-hash-password-in-client-side/

  1. HTTPS must
  2. 无加密的问题:
    1. 可能被 MITM 等方式看到明文密码,继而「撞库」等
    2. 可能被错误 logging 记录明文密码
    3. 加密可以规避上述问题
  3. hash 解决了被看到「明文密码」,避免被撞库,但弱 hash 可通过彩虹表得到明文密码
  4. hash 无法解决被直接使用,比如通过 MITM/logging 等拿到 hash 后可以直接访问
  5. 端侧 public key 加密,服务端 private key 解密,也是同样的问题
  6. SRP (Secure Remote Password protocol) 是更好的解决方案
  7. 或者 Passkeys

Shell file tests

List of common file test operators used in shell scripts:

  • -e file: Check if file exists
  • -f file: Check if file exists and is a regular file
  • -d file: Check if file exists and is a directory
  • -s file: Check if file exists and has size greater than 0
  • -r file: Check if file exists and is readable
  • -w file: Check if file exists and is writable
  • -x file: Check if file exists and is executable
  • -p file: Check if file exists and is a named pipe (FIFO)

Check man test for more and details.

How to know I'm using venv Python

  • Solution 1: use sys.prefix that points to the Python directory
  • Solution 2 (the better way): VIRTUAL_ENV environment variable. When a virtual environment is activated, this is set to the venv’s directory, otherwise it’s None.
import os
print(os.environ.get('VIRTUAL_ENV'))

Special Characters in Bash