The Formula Insight API converts raw Excel files into structured, labeled JSON format, capturing semantic context for every cell. This allows for automating processes that rely on complex spreadsheet data. For example, the output can be processed and stored in a vector database for Retrieval-Augmented Generation (RAG), enabling subsequent usage with Large Language Models (LLMs).

Currently, the API works best for labeling financial models. We are working on expanding the capabilities to more general use cases.

Obtaining an API Key

We are offering limited access to our beta. To request an API key, please fill out the following form: https://forms.gle/R5RvCNYN8ANsamUFA. If your submission is approved, we’ll send you an API key via email.

Making Requests

You can paste the Python code below directly into your script. Replace 'your_api_key' with the API key that was emailed. Replace file_path with the path to your Excel file. To extract data from a specific sheet, specify the sheet_name. To extract data from all sheets, set sheet_name = 'all'.

import requests

# API Key given to user
apikey = 'your_api_key' 

# Define the URL of the endpoint
url = 'https://api.formulainsight.io/analyze-spreadsheet'

# Define the path to the Excel file you want to upload
file_path = '/Users/stefanraghavan/Desktop/local-github/example-spreadsheets/INSM Model.xlsx'

# Define the sheet name you want to process
sheet = 'IS'  # Replace 'IS' with the desired sheet name, or use 'all' for all sheets

# Make the POST request
with open(file_path, 'rb') as file:
    response = requests.post(url, headers={'X-API-Key': apikey}, files={'file': file}, data={'sheet_name': sheet})

# Print the response from the server
print(response.json())

Repsonse - Labels object

The labels object represents a comprehensive data structure for each numerical cell in the spreadsheet, providing both the cell’s value and its contextual information. This structure is designed to capture the hierarchical nature of financial data, from including categories for specific metrics. Here’s a detailed breakdown of each field:

  1. value: The actual numerical value of the cell, typically a floating-point number.
  2. displayed_value: A string representation of the value as it appears in the spreadsheet, which may include formatting (e.g., number of decimal places).
  3. formula: If applicable, the Excel formula used to calculate the cell’s value. This provides transparency into the data’s derivation.
  4. date: The time period associated with this data point, often a fiscal year or quarter (e.g., “2021A” for actual 2021 data).
  5. metric: The most specific label for this data point, typically the row header directly associated with the cell (e.g., “R&D Expenses”).
  6. category1, category2, etc.: These represent increasingly broader categories that the metric falls under, forming a hierarchy. The numbering goes from more specific (category1) to more general (category2, category3, etc.).
  7. description: A contextual sentence summarizing the key information about this data point, including its metric, categories, date, value, and source spreadsheet.
  8. last_updated: A timestamp indicating when this cell was last updated in the model.
  9. source: Detailed references including spreadsheet name, sheet name and cell address.
  "labels": {
    "IS": [
      {
        "value": 0.51,
        "displayed_value": "0.51",
        "formula": "=IFERROR(M21/H21-1,\"NM\")",
        "date": "2021A",
        "metric": "R&D Expenses",
        "category1": "Growth (y/y)",
        "category2": "Income Statement ($M)",
        "description": "R&D Expenses for Growth (y/y) for Income Statement ($M) for 2021A is 0.51 in INSM Model.xlsx",
        "last_updated": "2024-08-13T07:48:19.289",
        "source": {
          "spreadsheet": "INSM Model.xlsx",
          "sheet": "IS",
          "cell": "IS!M53"
        }
      },
      ...
      {
        "value": 0.23,
        "displayed_value": "0.23",
        "formula": "=IFERROR(E19/E11,\"NM\")",
        "date": "2Q20A",
        "metric": "Base COGs (% of Product Sales)",
        "category1": "Margin Analysis",
        "category2": "Income Statement ($M)",
        "description": "Base COGs (% of Product Sales) for Margin Analysis for Income Statement ($M) for 2Q20A is 0.23 in INSM Model.xlsx",
        "last_updated": "2024-08-13T07:48:19.289",
        "source": {
          "spreadsheet": "INSM Model.xlsx",
          "sheet": "IS",
          "cell": "IS!E60"
        }
      },
    ]
  }

Example Usage

Let’s use the API to label the following income statement data and store it in the labels object.

Running the API call returns the following label for the first numerical cell:

      {
        "value": 136.47,
        "displayed_value": "136.47",
        "formula": "=136.467",
        "date": "2019A",
        "metric": "ARIKAYCE",
        "category1": "Product Sales",
        "category2": "Income Statement ($M)",
        "description": "ARIKAYCE for Product Sales for Income Statement ($M) for 2019A is 136.47 in INSM Model.xlsx",
        "last_updated": "2024-08-13T07:48:19.289",
        "source": {
          "spreadsheet": "INSM Model.xlsx",
          "sheet": "IS",
          "cell": "IS!C5"
        }
      }