Local LLM with llamafiles

I just tried out llamafiles and I am blown away on how easy to setup and use they are. Very decent speed on Apple silicon. Very decent code completion.

I have a realtime speed example here:

Example integration

I want to be a bit more aggressive in auto-retrying LLM prompting in a loop, but the cost is a bit prohibitive so I wanted to explore my options a bit. I am really excited by this option. My M1 Pro 16GB is a little underpowered though, I could not run every model. Next laptop refresh I will beg for a 32GB or better.

1 Like