June 2024

Posted on Jun 1, 2024

ML/DS

https://openai.com/index/introducing-the-model-spec openais new Model Spec
https://ai.stanford.edu/~kzliu/blog/unlearning machine unlearning
https://arxiv.org/abs/2303.12712 sparks of agi
https://github.com/naklecha/llama3-from-scratch llama3 from scratch
https://github.com/stanfordnlp/dspy prompt framework
https://pivotal.substack.com/p/how-to-price-a-data-asset
https://hendersontrent.github.io/posts/2024/05/gaussian-process-time-series gaussian bayesian time series
https://hazyresearch.stanford.edu/blog/2024-05-12-tk gpus go brrr
https://jalammar.github.io/illustrated-word2vec illustrated word2vec
https://docs.databricks.com/en/generative-ai/vector-search.html databricks in the vector db space
https://dtkaplan.github.io/Lessons-in-statistical-thinking lessons in statistical thinking

~ ~ alignment zone ~ ~

https://www.alignmentforum.org/posts/2roZtSr5TGmLjXMnT/toward-a-mathematical-framework-for-computation-in superposition
https://www.lesswrong.com/posts/gTZ2SxesbHckJ3CkF/transformers-represent-belief-state-geometry-in-their
https://devinterp.com/projects ideas for alignment projects
https://www.lesswrong.com/posts/DZHmEmzujfuqfxbJY/open-call-for-research-assistants-in-developmental
https://www.lesswrong.com/s/SfFQE8DXbgkjk62JK/p/TjaeCWvLZtEDAS5Ex more devinterp stuff
https://arxiv.org/abs/2310.06301 dynamical vs bayesian phase transitions in toy model of superposition
https://arxiv.org/pdf/2405.05417 extension of glitch token work
https://www.perfectlynormal.co.uk/blog-induction-heads-illustrated and
https://www.perfectlynormal.co.uk/blog-kl-divergence and
https://www.perfectlynormal.co.uk/blog-svd and
https://www.youtube.com/watch?v=GkPhwnvRe-8&t=7133s
**https://www.lesswrong.com/posts/tkEQKrqZ6PdYPCD8F/computational-mechanics-hackathon-june-1-and-2 hackathon start of june**
https://arxiv.org/pdf/2405.00208 primer on the inner working of transformer-based language models
https://huggingface.co/zero-gpu-explorers free GPU access for Spaces
https://arxiv.org/pdf/2310.10688 a decoder-only model for time series forecasting
https://www.lesswrong.com/posts/Quht2AY6A5KNeZFEA/timaeus-s-first-four-months timaeus retro
https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/paligemma/finetune_paligemma.ipynb finetuning paligemma on a vision task
https://github.com/google-ai-edge/model-explorer/wiki/4.-API-Guide google’s model explorer
https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf logbook for meta’s llm training
https://www.lesswrong.com/posts/YmkjnWtZGLbHRbzrP/ transcoders enable fine-grained interpretable circuit analysis for language models
https://www.math.brown.edu/streil/papers/LADW/LADW_2017-09-04.pdf linear algebra done wrong
https://lilianweng.github.io/posts/2021-07-11-diffusion-models diffusion model explainer
https://arxiv.org/pdf/2405.04517 xLSTM
https://x.com/deepseek_ai/status/1787478990665777589 deepseek MoE model
https://arxiv.org/pdf/2405.03003 some PEFT
https://www.deepdataspace.com/blog/Grounding-DINO-1.5-Pro a SOTA object detection model
https://scrollprize.org/2024_prizes the scroll prizes
https://vgel.me/posts/representation-engineering control vectors, v cool
https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html splashy anthropic result
https://www.alignmentforum.org/posts/pH6tyhEnngqWAXi9i/eis-xiii-reflections-on-anthropic-s-sae-research-circa-may critique of splashy anthropic result

non ML

https://matthewrocklin.com/feedback.html
https://near.blog/where-are-the-builders
https://tratt.net/laurie/blog/2024/what_factors_explain_the_nature_of_software.html explanation of software design
https://sashachapin.substack.com/p/50-things-i-know chapin banger
https://www.rsrch.space/ an extremely cool set of curated links (my bookshelf is based on this)
https://www.approachwithalacrity.com/what-are-you-getting-paid-in nice leila clark post