MongoDB database schema

Why we chose MongoDB

We chose MongoDB because it is a NoSQL database that is easy to scale and manage. It is also very flexible and can store data in a variety of formats. This makes it ideal for storing the unstructured data that is generated by the Prompt Sail application.

Database schema

There are three main objects in the database: projects, transactions, and settings. Each of them has a similar structure.

The projects table is the most important because it is the parent of the other two.

The transactions table is the most numerous because each user request creates a new transaction.

Projects

Projects is the main object in the database. It contains all the information about a project, including the name, description, and providers.

{
    "_id": "Project ID",
    "name": "Project name", 
    "slug": "slugified-project-name",
    "description": "Project description",
    "ai_providers": [
        {
            "deployment_name": "Deployment name",
            "slug": "slugified-deoloyment-name",
            "api_base": "https://api.ai-provider.com",
            "description": "AI provider description",
            "provider_name": "AI provider name"
        },
        ...
    ],
    "tags": [
        "tag1",
        "tag2",
        ...
    ],
    "org_id": "Organization ID",
    "created_at": "2024-04-28T18:48:35+01:00",
    "owner": "User ID",
}
  • _id is the unique identifier for the project. It is generated by MongoDB when a new project is created.
  • name is the name of the project.
  • slug is a URL-friendly version of names fields.
  • ai_providers is an array of objects that contain information about the AI providers that are associated with the project.
  • tags is an array of strings that contain tags that are associated with the project. It might be an empty array.
  • org_id is the organization name that the project belongs to. It might be None if the project is not associated with an organization.
  • created_at is the date and time that the project was created.
  • owner is the user that created the project.

Transactions

Transactions is the object that contains all the information about a user request and server response.

{
    "_id": "Transaction ID",
    "project_id": "ID of used Project",
    "request": "User input",
    "response": "AI provider output",
    "tags": [
        "tag1",
        "tag2",
        ...
    ],
    "provider": "name of used AI provider",
    "model": "used AI model name",
    "type": "",
    "os": "Operating system",
    "input_tokens": 150,
    "output_tokens": 600,
    "library": "AI provider library",
    "status_code": 200,
    "messages": [
        {
            "message": "Message text",
            "type": "Message type"
        },
        ...
    ],
    "last_message": "Last response message text",
    "prompt": "Prompt text",
    "error_message": "Error message",
    "generation_speed": 0.5,
    "input_cost": 0.004,
    "output_cost": 0.021,
    "total_cost": 0.025,
    "request_time": "2024-04-28T18:48:35+01:00",
    "response_time": "2024-04-28T18:48:35+01:00",
} 
  • _id is the unique identifier for the transaction. It is generated by MongoDB when a new transaction is created.
  • project_id is the unique identifier of the project that the transaction is associated with.
  • request is the user request object that was sent to the AI provider.
  • response is the AI provider response object that was received from the AI provider.
  • tags is an array of strings that contain tags that are associated with the transaction. It might be an empty array.
  • provider is the name of the AI provider that was used for the transaction.
  • model is the name of the AI model that was used for the transaction.
  • type is the type of the transaction (chat, chat completion, embedding etc.). It might be an empty string.
  • os is the operating system that the user was using when the transaction was created. It might be an empty string or None.
  • input_tokens is the number of tokens in the user input.
  • output_tokens is the number of tokens in the AI provider output.
  • library is the name of the AI provider library that was used for the transaction.
  • status_code is the HTTP status code that was received from the AI provider.
  • messages is an array of objects that contain information about the messages that were sent and received during the transaction.
  • last_message is the last message that was received from the AI provider. It might be an empty string or None when transaction status_code is > 200.
  • prompt is the prompt that was sent to the AI provider.
  • error_message is the error message that was received from the AI provider. It might be an empty string or None when transaction status_code is 200.
  • generation_speed is the generation speed of the AI provider. It is calculated by dividing the output_tokens by the delta of response_time and request_time.
  • input_cost is the cost of the user input. It is calculated by multiplying the input_tokens by the cost (from price list for model).
  • output_cost is the cost of the AI provider output. It is calculated by multiplying the output_tokens by the cost (from price list for model).
  • total_cost is the total cost of the transaction. It is calculated by adding the input_cost and output_cost.
  • request_time is the date and time of registered user request.
  • response_time is the date and time that of registered AI provider response.

Updated: