Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.

2024-04-21

LLaMa3

I've been helping out with infrastructure and tools for training LLaMa3 at Meta: I'm very happy to be able to help because I think having something of LLaMa's quality easily available for hacking is one of the things that will shape how LLMs are generally applied and used, and contributing to that is very satisfying. I'm even in the model card, along with some very well known people.

At the same time, I could use ellama, ollama and LLaMa 8b to have my very own local LLM -- which has been fairly helpful. Ellama's prompts around reviewing highlighted code, implementing functions, etc. is exactly what I'd dreamt of a long time ago and hadn't expected to be true so soon. The UX is still a bit rough and generating tokens on my laptop CPU is slow, but I expect that to constantly, inexorably improve the way things have been going.

I'm not thinking about finetuning / distilling a LLaMa model down to something that can translate CLI commands on my behalf; eg. "extract this tarfile". I think it should be very doable -- and maybe a good excuse to learn torchtune -- but I need more time and energy.

Python & Emacs

As part of consolidating my .emacs I've been cleaning up my Python setup as well. I rebuilt and moved to the latest on Emacs's master branch -- the fact that I can smoothly run on Emacs master always amazes me -- and set up Jedi, Ruff (using LSPs), while relying on some existing snippets for devdocs.io and Ellama integration.

All of this means I get some very cool auto completion, almost instant error & syntax checking and warnings with minimal setup or dependency on the repository I'm editing.

I still have some trouble with both Auto-complete mode and Company mode turning on and both trying to complete what I'm typing; I'll dig in some more and start publishing my configurations:

Penzai

JAX released some very interesting tools, including a visualization tool that is almost exactly like what I was hoping to see with PyTorch. This also makes it much easier to explain -- though I think i'd probably go with a little bit more whitespace in the UI if I was designing it -- and seems pretty powerful.

I need to find the time to hack on this and actually make an interactive UI or CLI around it. And a top or below like interface to TensorBoard.

Wax, and languages

Continuing the theme of looking for lisp-like homoiconic languages that compiled down to C, I ran into some reddit posts and links -- particularly this list of lisp-like languages. There are several interesting ideas in there, but some day I'd like to implement my own, potentially working backwards from the C grammar to make sure everything can easily and cleanly be expressed, and then layering on language sugar on top of that.

As mechanisms for procrastination go, inventing the language to program in before actually getting around to programming seems unfortunately too far up my alley. I'll save this particular project for some slow months.

Stanford Lectures on Transformers

More rough notes from the lectures.

Nathan Lambert, Allen Institute for AI

History

1948 Claude Shannon models english in 1948
Auto regressive loss function
2017 Attention in 2017
2018 Elmo, bert, gpt1
2019 GPT2, scaling laws, safety discussions
2020 GPT 3
2021 Stochastic Parrot
2022
RLHF is necessary but not sufficient for ChatGPT
hf.co/collections/natolambert
aligning open models
models trained with some preference learning technique
alignment of models
- IFT instruction fine tuning -- follow instructinos via autoregressive lm loss
- SFT supervised fine tuning -- task specific capabilities
- Alignment -- train a model to mirror user desires, any loss
- Reinforcement learning from human feedback -- train models from human data
- preference fine tuning* -- labelled preference data to fine tune an lm
  - PPO
  - DPO
Chapter 0:
- craziness till llama dropped
Alpaca
- first open instruction tuned models
self instruct, synthetic data
- use a model to generate new instruction data to fine tuning a model
Vicuna after alpaca
- llm as a judge
Koala (2023) berkeley
Dolly
Lora methods don't work with RL
- had limitations in applications
Uncensored models
- no filtering
- never censored to begin with
lots of transition models: didn't change the narrative
chatbot arena: defining strategy
evals
- chatbot arena
RLHF
- optimize reward inspired by human preferences
- penalty policy
- increase reward, but constrain model to not go too far
Starling models
Modern ecosystem
- PPO vs DPO
- AllenAI also sees PPO to be stronger
- Meta added today
Way more types of models
- genstruct: rephrasing text into instructions
- AI2: reproducible open source models
- Llama3 scaling than alignment
Current directions
- model merging is super accessible
Personalized lms
- local llm
Don't bet against progress continuing

— Kunal