June 2024
ML/DS
- https://openai.com/index/introducing-the-model-spec openais new Model Spec
- https://ai.stanford.edu/~kzliu/blog/unlearning machine unlearning
- https://arxiv.org/abs/2303.12712 sparks of agi
- https://github.com/naklecha/llama3-from-scratch llama3 from scratch
- https://github.com/stanfordnlp/dspy prompt framework
- https://pivotal.substack.com/p/how-to-price-a-data-asset
- https://hendersontrent.github.io/posts/2024/05/gaussian-process-time-series gaussian bayesian time series
- https://hazyresearch.stanford.edu/blog/2024-05-12-tk gpus go brrr
- https://jalammar.github.io/illustrated-word2vec illustrated word2vec
- https://docs.databricks.com/en/generative-ai/vector-search.html databricks in the vector db space
- https://dtkaplan.github.io/Lessons-in-statistical-thinking lessons in statistical thinking
~ ~ alignment zone ~ ~
- https://www.alignmentforum.org/posts/2roZtSr5TGmLjXMnT/toward-a-mathematical-framework-for-computation-in superposition
- https://www.lesswrong.com/posts/gTZ2SxesbHckJ3CkF/transformers-represent-belief-state-geometry-in-their
- https://devinterp.com/projects ideas for alignment projects
- https://www.lesswrong.com/posts/DZHmEmzujfuqfxbJY/open-call-for-research-assistants-in-developmental
- https://www.lesswrong.com/s/SfFQE8DXbgkjk62JK/p/TjaeCWvLZtEDAS5Ex more devinterp stuff
- https://arxiv.org/abs/2310.06301 dynamical vs bayesian phase transitions in toy model of superposition
- https://arxiv.org/pdf/2405.05417 extension of glitch token work
- https://www.perfectlynormal.co.uk/blog-induction-heads-illustrated and
- https://www.perfectlynormal.co.uk/blog-kl-divergence and
- https://www.perfectlynormal.co.uk/blog-svd and
- https://www.youtube.com/watch?v=GkPhwnvRe-8&t=7133s
- **https://www.lesswrong.com/posts/tkEQKrqZ6PdYPCD8F/computational-mechanics-hackathon-june-1-and-2 hackathon start of june**
- https://arxiv.org/pdf/2405.00208 primer on the inner working of transformer-based language models
- https://huggingface.co/zero-gpu-explorers free GPU access for Spaces
- https://arxiv.org/pdf/2310.10688 a decoder-only model for time series forecasting
- https://www.lesswrong.com/posts/Quht2AY6A5KNeZFEA/timaeus-s-first-four-months timaeus retro
- https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/paligemma/finetune_paligemma.ipynb finetuning paligemma on a vision task
- https://github.com/google-ai-edge/model-explorer/wiki/4.-API-Guide google’s model explorer
- https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf logbook for meta’s llm training
- https://www.lesswrong.com/posts/YmkjnWtZGLbHRbzrP/ transcoders enable fine-grained interpretable circuit analysis for language models
- https://www.math.brown.edu/streil/papers/LADW/LADW_2017-09-04.pdf linear algebra done wrong
- https://lilianweng.github.io/posts/2021-07-11-diffusion-models diffusion model explainer
- https://arxiv.org/pdf/2405.04517 xLSTM
- https://x.com/deepseek_ai/status/1787478990665777589 deepseek MoE model
- https://arxiv.org/pdf/2405.03003 some PEFT
- https://www.deepdataspace.com/blog/Grounding-DINO-1.5-Pro a SOTA object detection model
- https://scrollprize.org/2024_prizes the scroll prizes
- https://vgel.me/posts/representation-engineering control vectors, v cool
- https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html splashy anthropic result
- https://www.alignmentforum.org/posts/pH6tyhEnngqWAXi9i/eis-xiii-reflections-on-anthropic-s-sae-research-circa-may critique of splashy anthropic result
non ML
- https://matthewrocklin.com/feedback.html
- https://near.blog/where-are-the-builders
- https://tratt.net/laurie/blog/2024/what_factors_explain_the_nature_of_software.html explanation of software design
- https://sashachapin.substack.com/p/50-things-i-know chapin banger
- https://www.rsrch.space/ an extremely cool set of curated links (my bookshelf is based on this)
- https://www.approachwithalacrity.com/what-are-you-getting-paid-in nice leila clark post