Building an AI App with Node.js and node-llama-cpp

This blog post will guide you through creating a Node.js application that interacts with a large language model (LLM) using the node-llama-cpp library


6 min read

Building an AI App with Node.js and node-llama-cpp

Unleash the Power of AI

Large language models (LLMs) have taken the tech world by storm. These powerful AI models can generate realistic text, translate languages, write different kinds of creative content, and even answer your questions in an informative way. But how can you harness this power to build your own AI-powered application? This blog post will guide you through creating a Node.js application that interacts with an LLM using the `node-llama-cpp` library.

llama-cpp and GGUF: A Powerful Duo for Local LLM Inference

llama-cpp is a library that allows you to run large language models (LLMs) directly within your C/C++ applications. This means you can leverage the power of LLMs for tasks like text generation, translation, and question answering, all on your local machine.

Here's what makes llama-cpp special:

Fast and Efficient: It boasts minimal setup and state-of-the-art performance, enabling you to run LLMs smoothly on various hardware configurations.

Wide Hardware Support: It runs on major operating systems (Windows, Mac, Linux) and even integrates with cloud environments.

Rich Ecosystem: It comes with bindings for popular languages like Python, Node.js, and Rust, making it easy to integrate with your existing projects.

Now, let's talk about GGUF. This is a file format specifically designed for storing LLM models used by llama-cpp. Compared to the previous format (GGML), GGUF offers several advantages:

Improved Tokenization: It ensures better handling of text during LLM processing.

Special Token Support: It allows for including special tokens that can enhance the LLM's understanding of your prompts.

Metadata Support: It can store additional information about the model, making it easier to manage and track different models.

Extensibility: The format is designed to be flexible, allowing for future improvements and functionalities.

In essence, llama-cpp provides the engine for running LLMs locally, while GGUF offers a streamlined and efficient way to store and manage the LLM models that power your applications. Together, they form a powerful duo for anyone looking to leverage the potential of LLMs in their C/C++ projects.

Getting Started

Before diving in, you'll need Node.js and a package manager like npm or yarn installed on your system. Additionally, having atleast 8GB of RAM (ideally 16GB or more) with i3 CPU, GPU not necessary is recommended for smooth operation when working with larger LLM models.

Knowledge of JavaScript and nodeJS also gonna benefit you no prior knowledge of ai & ml required.

Setting Up Your Project

We will gonna build small cli tool which will take our query and give AI's response as output.

Let's create a new directory for your project. Open your terminal, navigate to your desired location, and run `npm init` or `yarn init` to initialize a basic Node.js project. This will create a `package.json` file that will store your project information and dependencies.

mkdir nodeai
cd nodeai
npm init
// Library
npm i node-llama-cpp //this llama-cpp library for nodejs

Choosing Your LLM Model

`node-llama-cpp` supports various LLM models, each with its own strengths and weaknesses. Head over to resources like Hugging Face Model Hub to explore compatible models. Keep in mind that larger models generally require more powerful hardware to run smoothly. Once you've chosen your champion, download the model file and place it in a designated folder within your project.

We will use Mistral-7B-Instruct-v0.2-GGUF download here

Building the AI Engine

Now comes the exciting part - building your application! Create a new JavaScript file, like `app.js`, where you'll write the code for interacting with the LLM. Here's where the magic happens:

import path from "path";
import {LlamaModel, LlamaContext, LlamaChatSession} from "node-llama-cpp";
import * as readline from 'node:readline/promises';

const model = new LlamaModel({
    //model should be in root folder of nodeai folder
    modelPath: path.join("Mistral-7B-Instruct-v0_2.gguf")
const context = new LlamaContext({model});
const session = new LlamaChatSession({context});

async function AiResponse() {
  let rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  let query = await rl.question("query: ");
  if (query.includes('exit')) {
  } else {
  let res =await session.prompt(query)
  console.log('ai : ', res );
    // type "exit" to close

1. Importing Libraries:

  • path: This module helps with manipulating file paths.

  • LlamaModel, LlamaContext, LlamaChatSession: These are classes from the node-llama-cpp library used for interacting with the LLM.

  • readline: This module is used to read user input from the console.

2. Initializing the LLM:

  • new LlamaModel: Creates a new instance of the LlamaModel class.

    • modelPath: This property specifies the location of the LLM model file. It uses path.join to construct the path relative to the root folder of the nodeai directory. Replace "Mistral-7B-Instruct-v0.2.gguf" with the actual filename of your LLM model.
  • new LlamaContext: Creates a new instance of the LlamaContext class, which provides context for the LLM.

    • model: This property references the previously created LlamaModel instance.
  • new LlamaChatSession: Creates a new instance of the LlamaChatSession class, which allows for an interactive conversation with the LLM.

    • context: This property references the previously created LlamaContext instance.

3. Defining theAiResponse Function:

  • This function handles the conversation loop with the LLM.

  • readline.createInterface: Creates a new readline interface for reading user input.

  • console.log: Prompts the user to enter their query with "query : ".

  • let query = await rl.question(): Reads the user's input and stores it in the query variable.

  • let res = await session.prompt(query): Sends the user's query (text) to the LLM using the prompt method of the LlamaChatSession instance and stores the LLM's response in the res variable.

  • console.log: Prints the LLM's response with "ai : ".

  • if (query.includes('exit')): Checks if the user's query includes the word "exit".

    • If it does, the rl.close() method is called to close the readline interface, effectively ending the program.
  • else: If the user doesn't enter "exit", the AiResponse function calls itself recursively, starting the conversation loop again.

4. Running the Application:

  • AiResponse() is called at the end of the script, initiating the conversation loop. The user will be prompted for input, and the LLM will respond until the user types "exit".

Running Your AI Creation

With the code written, it's time to bring your app to life! Navigate to your project directory in the terminal and execute `node app.mjs`. Now you have a running Node.js application that can interact with your chosen LLM model.


The Future of AI Apps

This blog post has given you a taste of building AI applications with Node.js and `node-llama-cpp`. The possibilities are endless! You can create chatbots, generate creative content, or build tools that leverage the power of LLMs. Remember, this is just the beginning. As LLM technology continues to evolve, so will the capabilities of the applications you can build. So, keep exploring, experiment with different LLMs and prompts, and unleash the power of AI in your next project!


A Glimpse into Practical Applications

Want to see a real-world example? Imagine building a simple text summarization app. You could provide a long piece of text as a prompt, and the LLM would generate a concise summary for you. This is just one example of the many potential applications you can create with this powerful combination of Node.js and `node-llama-cpp`.

Source Code

Project Source code here

My Portfolio

Did you find this article valuable?

Support Monu Shaw by becoming a sponsor. Any amount is appreciated!