AI Function Calling: an Introduction
- What is function calling?
AI's Biggest Strength
Post Overview
The Approach

This post is from my drafts. I wrote the bulk of this post in June 2024, but am now getting around to publishing it.

AI Function Calling: an Introduction

OpenAI made a significant leap in AI capabilities when they introduced function calling on June 13, 2023.

I can't pinpoint my first exposure to AI function calling, but it was either: AutoGPT, which I discovered & played with as soon as I saw this post on hackernews in April 2023, or TypingMind, which added official support for plugins (which uses function calling) on June 26, 2023. TypingMind had unofficial support for plugins before that, and I suspect I first saw it while poking around TypingMind settings.

What is function calling?

Function calling allows developers to connect AI models like GPT directly with tools and APIs through a clearly defined JSON schema. Refer to OpenAI's function calling docs for a deeper understanding. In short, its like teaching an AI to know when to fetch data or perform an action, and letting the AI tell the developer's code to perform the action and with what parameters.

What this means in layman's terms is that developers can now define tasks—like fetching weather data or sending an email—and have the AI determine when to call those tasks and how to do it smartly. Instead of having the AI just give you information, it can now take action on your behalf.

Function calling opens the door for AI models to interface hands-on with external systems by understanding input as actions. For instance, an AI could automatically convert, “Email Luke to see if he wants to get coffee next Friday” into an actionable function call like send_email(to: string, body: string). Similarly, a question like “What’s the weather like in Boston?” becomes get_current_weather(location: string, unit: 'celsius' | 'fahrenheit').

AI's Biggest Strength

Function calling is the perfect use for AI. Remember that GPT stands for "generative predictive text" and LLM stands for "large language model". The key is in the names: AI is good at language!

Function calling... that is–taking natural language input like "find my favorite contacts and send them an email inviting them over for dinner Friday night" and converting it into a series of steps that a machine can take to achieve this goal–leans heavily on AI's greatest strength–taking natural language and "understanding" how to translate it into a machine-readable language.

Post Overview

Once I became privy to function calling in early 2023, I baked support for it into my Shortcuts Toolbox Project by adding a new /ai/perform/?task= endpoint.

Because OpenAI hadn't yet added function calling to their API, I had to get clever and achieve the same thing with some clever system prompting.

Now in June 2024, I am rewriting that code to integrate the function calling API. I figured I'd write a post explaining my previous approach before I delete that code forever.

The Approach

My approach was inspired by TypingMind. In the early days of TypingMind, you could see the full system prompt, and while inspecting it, I noticed that it contained instructions saying something like "if you want to use the search tool, respond with json: {search: string} containing your query". A lightbulb immediately went off in my head. This was brilliant!

Clever Prompting and System Message

The system message I crafted for Shortcuts Toolbox:

The user will tell you what goal they wish to achieve, and optionally, the context in which you are to respond.
Pay special attention to the context when constructing the plan, and make sure that the final output will adhere
to the desired context. E.g. if the user asks for an html webpage, and you ask GPT for it, make sure to tell GPT
that it should respond with html, and nothing extra. Sometimes the context will mention an existing User Tool.
If this is the case, you should use that tool and send it what would otherwise be your final output.
If the response is unspecified, use your best judgement. A text response is usually most appropriate.

If the user asks you to access their data, do not hallucinate or make up data.
If you are unable to fetch the data using the tools, state as such in your response.
If you defer to GPT, relay the message to not halluicate or make up data to GPT so that
it can provide a more appropriate response.

YOUR JOB
Return a list of steps that can be run, in sequence to piece together the available
tools to accomplish the user's goal.

At the bottom of this message is a list of User Tools that are available to you.
Your response (the steps) will be passed to a runner that executes the tools in
sequence and with the inputs that you specify.
The runner's code is below. Follow the RunnerSpecification format in your response.

STICK TO THIS SPEC:

${runnerContents}

Special Tools:
HANDLEBARS_RENDERER
Description:
This is a special tool that we ALWAYS run when processing the input and output properties.
Rules:
1. This does not exist as its own tool, it is automatically applied to all inputs and outputs, refer to the processInput implementation for a deep understanding of how the input and output properties are processed
2. When processing the input for a tool, or the output for the spec, its value may be a string or an object. For an object, we will recursively render each string as a handlebars template.
3. In a lot of cases, if a shortcut returns JSON, and the user wants you to use GPT to summarize the results, you do not need to map the response before passing it to GPT, you can simply pass along the json response from the previous step.
4. If you need to combine JSON from two actions, or pass a response as json, you can do so using a handlebars template.
   e.g. step 1: fetchWeather step2: fetchCalendar output: \`&#123;"weather": {{{json fetchWeather}}}, "calendar": {{{json fetchCalendar}}}&#125;\`
5. Generally, use {{{data}}} instead of two {{data}}. We want double quotes as double quotes, not &quot; etc.
6. If a tool outputs JSON and you want to pass it as input or output, you MUST do so via the handlebars json helper e.g.: {{{json fetchWeather}}}
ALWAYS REMEMBER THE ABOVE RULE
If your goal is to send a notification and return the response:
INCORRECT APPROACH, DO NOT DO THIS
  step 1: sendNotification
  output: '{{sendNotification}}'
CORRECT APPROACH
  step 1: sendNotification
  output: {{{json sendNotification}}}
Input:
This tool will always render a steps input as a handlebars template.
The context for the template will be {[stepID]: <output from step with id stepID>}.
Handlebars helpers:
  There is only a single handlebars helper. You can use if, etc. but you can compare using eq. etc. handlebars only lets you check for truthiness.
  - \`json\` - which render data directly as json in the results
    Example usage: {{{json weatherConditions}}} or {{#each weatherConditions.daily}}{{{json this}}}{{/each}}
Output:
The resulting rendered template–either a string or a JSON object
The runner will attempt to parse the results as JSON, and if successful, will pass the resulting object, instead of a string, to the step's input.
---
ASK_GPT
Description:
Another special tool that queries GPT with the given prompt.
Input:
  object of format { system: string, user: string, jsonMode?: boolean }
  // jsonmode defaults to false, USE THIS IF YOU WANT JSON FROM GPT
Output: The AI assistant's response
---
User Tools:
${toolSummary}

Tips and Tricks:
The tool summary is passed along when running ASK_GPT. If you need it to configure inputs for tool for you, use JSON mode and reference the tool by name.
e.g. if you need it to configure a notification for you, tell GPT to configure a notification for <notification tool name> about: blah

This guidance ensures the AI grasps user goals in context, converting them into actionable plans.

Note that for the ${runnerContents}, I inserted the contents of the typescript file that I created which runs the spec that the AI returns. This file contains the explicit type definitions and code for the runner, and gives GPT an intimate understanding of what my code will do with its output.

My Approach vs. Function Calling Paradigm

What's interesting is how my solution deviates from the function calling approach:

Function Calling API: You make an API call to trigger a function call, execute it, make another call to GPT, and iterate until the AI determines it's done.
My Method: The entire plan is laid out up front, then each step is executed in sequence. This ensures everything stays on track and everything's tight from the start, allowing fluid task handling and adaptability.

Expanded Function Calling Example

Sample Task: Fetch Weather Conditions and Notify My iPhone

Let's dive into a detailed look at how everything functions together with this expanded example:

Goal: Get the current weather and alert the iPhone.

Tool Descriptions:

FetchWeather: Retrieves weather data for a designated locale.
SendNotification: Alerts a selected device, such as an iPhone.

After passing the above to GPT with the system prompt, it might come up with the following plan:

Plan:
1. Use FetchWeather to grab the latest weather conditions.
2. Use SendNotification to ping the iPhone with a weather update.

Plan as JSON:

{
  "goal": "Fetch Weather Conditions and Notify My iPhone",
  "steps": [
    {
      "id": "step1",
      "tool": "FetchWeather",
      "input": {
        "location": "New York"
      }
    },
    {
      "id": "step2",
      "tool": "SendNotification",
      "input": {
        "device": "iPhone",
        "message": "The current weather in New York is {{step1.weatherDescription}} with a temperature of {{step1.temperature}}."
      }
    }
  ],
  "output": "{{step2}}"
}

Pseudocode for the Runner

Here's a simplified version of the pseudocode for driving this method, connecting planned steps with execution:

function runner(spec) {
    initialize context = {};
    for each step in spec.steps do {
        processInputForStep();
        if step.tool is 'ASK_GPT' then {
            response = queryGPT(step.processedInput);
            context[step.id] = response;
        } else {
            // Mock tool execution without a formal API call
            context[step.id] = executeTool(step.tool, step.processedInput);
        }
    }
    return processOutput(spec.output, context);
}

Pros and Cons

Pro: Fewer calls to ChatGPT = lower costs. My approach uses a single call to Chat GPT to create the plan and then executes that plan. The plan can make additional calls to GPT, but for most requests will only do so once or twice. These calls are not used to revise the plan, just to get a response for a given step.

Con: The plan isn't always good. With my approach, the AI has one chance to come up with a plan, and no opportunity to modify that plan once it is set in motion. Sometimes it gets it wrong, and the whole task fails.

Con: Verbose system prompts. As you can see in the system message, I had to do a lot of coaching to get this right. Since GPT's official function calling support means the model was explicitly trained to understand functions, your system prompt can be much cleaner if you use the official API.

Con: Sometimes, but not every time, Open AI keeps flagging my prompts

Task: Build a pretty, responsive webpage containing... my tasks from things on a chartjs scatter plot plotted by impact vs effort you will need to determine impact and effort yyourself before plotting please list tasks under the chart Don't forget to use tools to fetch the relevant data

Context: stylized html webpage

Response:

BadRequestError: 400 Invalid prompt: your prompt was flagged as potentially violating our usage policy. Please try again with a different prompt: https://platform.openai.com/docs/guides/reasoning#advice-on-prompting

Future with Tool Calling

Though this method relied on clever system prompting, function calling is the future of AI agent software.

I'm eager to transition and grow alongside these emerging possibilities.

Thanks for reading! Hopefully you learned something new about building software with AI.

tagged

email permalink

Faking ChatGPT Function Calling February 28, 2025
10 min read • 1970 words

Table of Contents