MongoDB database schema

Why we chose MongoDB

We chose MongoDB because it is a NoSQL database that is easy to scale and manage. It is also very flexible and can store data in a variety of formats. This makes it ideal for storing the unstructured data that is generated by the Prompt Sail application.

Database schema

There are three main objects in the database: projects, transactions, and settings. Each of them has a similar structure.

The projects table is the most important because it is the parent of the other two.

The transactions table is the most numerous because each user request creates a new transaction.

Projects

Projects is the main object in the database. It contains all the information about a project, including the name, description, and providers.

{
    "_id": "Project ID",
    "name": "Project name", 
    "slug": "slugified-project-name",
    "description": "Project description",
    "ai_providers": [
        {
            "deployment_name": "Deployment name",
            "slug": "slugified-deoloyment-name",
            "api_base": "https://api.ai-provider.com",
            "description": "AI provider description",
            "provider_name": "AI provider name"
        },
        ...
    ],
    "tags": [
        "tag1",
        "tag2",
        ...
    ],
    "org_id": "Organization ID",
    "created_at": "2024-04-28T18:48:35+01:00",
    "owner": "User ID",
}

_id is the unique identifier for the project. It is generated by MongoDB when a new project is created.
name is the name of the project.
slug is a URL-friendly version of names fields.
ai_providers is an array of objects that contain information about the AI providers that are associated with the project.
tags is an array of strings that contain tags that are associated with the project. It might be an empty array.
org_id is the organization name that the project belongs to. It might be None if the project is not associated with an organization.
created_at is the date and time that the project was created.
owner is the user that created the project.

Transactions

Transactions is the object that contains all the information about a user request and server response.

{
    "_id": "Transaction ID",
    "project_id": "ID of used Project",
    "request": "User input",
    "response": "AI provider output",
    "tags": [
        "tag1",
        "tag2",
        ...
    ],
    "provider": "name of used AI provider",
    "model": "used AI model name",
    "type": "",
    "os": "Operating system",
    "input_tokens": 150,
    "output_tokens": 600,
    "library": "AI provider library",
    "status_code": 200,
    "messages": [
        {
            "message": "Message text",
            "type": "Message type"
        },
        ...
    ],
    "last_message": "Last response message text",
    "prompt": "Prompt text",
    "error_message": "Error message",
    "generation_speed": 0.5,
    "input_cost": 0.004,
    "output_cost": 0.021,
    "total_cost": 0.025,
    "request_time": "2024-04-28T18:48:35+01:00",
    "response_time": "2024-04-28T18:48:35+01:00",
} 

_id is the unique identifier for the transaction. It is generated by MongoDB when a new transaction is created.
project_id is the unique identifier of the project that the transaction is associated with.
request is the user request object that was sent to the AI provider.
response is the AI provider response object that was received from the AI provider.
tags is an array of strings that contain tags that are associated with the transaction. It might be an empty array.
provider is the name of the AI provider that was used for the transaction.
model is the name of the AI model that was used for the transaction.
type is the type of the transaction (chat, chat completion, embedding etc.). It might be an empty string.
os is the operating system that the user was using when the transaction was created. It might be an empty string or None.
input_tokens is the number of tokens in the user input.
output_tokens is the number of tokens in the AI provider output.
library is the name of the AI provider library that was used for the transaction.
status_code is the HTTP status code that was received from the AI provider.
messages is an array of objects that contain information about the messages that were sent and received during the transaction.
last_message is the last message that was received from the AI provider. It might be an empty string or None when transaction status_code is > 200.
prompt is the prompt that was sent to the AI provider.
error_message is the error message that was received from the AI provider. It might be an empty string or None when transaction status_code is 200.
generation_speed is the generation speed of the AI provider. It is calculated by dividing the output_tokens by the delta of response_time and request_time.
input_cost is the cost of the user input. It is calculated by multiplying the input_tokens by the cost (from price list for model).
output_cost is the cost of the AI provider output. It is calculated by multiplying the output_tokens by the cost (from price list for model).
total_cost is the total cost of the transaction. It is calculated by adding the input_cost and output_cost.
request_time is the date and time of registered user request.
response_time is the date and time that of registered AI provider response.