Introduction
The Formula Insight API converts raw Excel files into structured, labeled JSON format, capturing semantic context for every cell. This allows for automating processes that rely on complex spreadsheet data. For example, the output can be processed and stored in a vector database for Retrieval-Augmented Generation (RAG), enabling subsequent usage with Large Language Models (LLMs).
Currently, the API works best for labeling financial models. We are working on expanding the capabilities to more general use cases.
Obtaining an API Key
We are offering limited access to our beta. To request an API key, please fill out the following form: https://forms.gle/R5RvCNYN8ANsamUFA. If your submission is approved, we’ll send you an API key via email.
Making Requests
You can paste the Python code below directly into your script. Replace 'your_api_key'
with the API key that was emailed. Replace file_path
with the path to your Excel file. To extract data from a specific sheet, specify the sheet_name
. To extract data from all sheets, set sheet_name = 'all'
.
Repsonse - Labels object
The labels object represents a comprehensive data structure for each numerical cell in the spreadsheet, providing both the cell’s value and its contextual information. This structure is designed to capture the hierarchical nature of financial data, from including categories for specific metrics. Here’s a detailed breakdown of each field:
value
: The actual numerical value of the cell, typically a floating-point number.displayed_value
: A string representation of the value as it appears in the spreadsheet, which may include formatting (e.g., number of decimal places).formula
: If applicable, the Excel formula used to calculate the cell’s value. This provides transparency into the data’s derivation.date
: The time period associated with this data point, often a fiscal year or quarter (e.g., “2021A” for actual 2021 data).metric
: The most specific label for this data point, typically the row header directly associated with the cell (e.g., “R&D Expenses”).category1
,category2
, etc.: These represent increasingly broader categories that the metric falls under, forming a hierarchy. The numbering goes from more specific (category1) to more general (category2, category3, etc.).description
: A contextual sentence summarizing the key information about this data point, including its metric, categories, date, value, and source spreadsheet.last_updated
: A timestamp indicating when this cell was last updated in the model.source
: Detailed references including spreadsheet name, sheet name and cell address.
Example Usage
Let’s use the API to label the following income statement data and store it in the labels object.
Running the API call returns the following label for the first numerical cell: