LLaMA an Open Large Language Model

The rise of ChatGPT has been a huge moment for public engagement with large language models and I think its release impressed everyone. While the hype has died down a little at the time of writing I think it may have been one of the largest showcases of machine learning to the general public ever to have occurred. The possibility of talking with an AI in any capacity where it generated generally good responses was once only a science fiction idea that has now been made real. While perhaps the real world utility of such a thing is still to be implemented the idea has excited the world.

What is perhaps less exciting is the propriety nature of most of the current large language models on display. This puts it out of reach of all but the inhouse researchers. It also has a real environmental cost as it means multiple parties training extremely large language models. It is therefore nice to see an open source competitive model being developed. The LLaMA models represent a positive step in this direction and allow for further training for specific uses without the from scratch training then necessitates enormous power consumption. Future users can use these models against their own datasets with the pretrained weights provided to develop solutions with comparatively minimal power usage and also provide models for inference without the same power draw.

The LLaMA model also incorporates some interesting features. The radial embedding instead of positional embedding and the swiGELU activation function seem like iterative improvements over previous approaches but every development in areas so core to the task of machine learning have an impact that is broader than it may first appear. I will need to investigate these things for my own experiments.

LLaMA: Open and Efficient Foundation Language Models
https://arxiv.org/pdf/2302.13971v1.pdf
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothee Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample
Meta AI
https://github.com/facebookresearch/llama

LLaMA an Open Large Language Model

Comments

Leave a Reply Cancel reply