Live · 26 tools

Databricks Workspace Integration for AI Agents

The Databricks REST API is well-documented. Getting your agent to run SQL against real users' workspaces — across multiple tenants, with Service Principal OAuth, warehouse routing, and per-user isolation — is the part that takes longer than it should.

Databricks Workspace

Live

Data Analytics

Automation

Status

Live

Tools

26 pre-built

Auth

Service Principal OAuth 2.0

Credential storage

Sandbox support

Start building free

Databricks Workspace + Agent Docs

Service Principal OAuth 2.0

Per-user isolation

Workspace routing

SQL + Unity Catalog tools

The real problem

Why this is harder than it looks

The Databricks REST API is clean and thoroughly documented. You can prototype SQL execution against your own workspace in an afternoon. The complexity arrives when you try to do this for real users in a multi-tenant product — and the authentication model is different from most SaaS connectors.

Databricks uses Service Principal OAuth 2.0. Unlike user-delegated OAuth flows, the agent authenticates as a service principal registered in the customer's Databricks workspace. This means each of your customers must register a service principal in their own workspace, grant it the right permissions, and supply a client ID and secret to your product. There is no OAuth redirect flow to standardize onboarding — you must build the credential collection UI, store those credentials encrypted, and associate each set with the right workspace URL. Every customer operates on a different workspace host (e.g. _{adb-1234567890.azuredatabricks.net}), so every API call must route to the correct base URL using the credentials for that tenant.

Beyond setup, the operational failure modes are non-obvious. Service principal tokens expire and must be refreshed using the M2M OAuth flow — if your refresh logic fails, all SQL calls for that tenant silently return 401s. SQL warehouses can be stopped; calling _{databricksworkspace_sql_statement_execute} against a stopped warehouse returns an error that looks identical to an auth failure. The Unity Catalog adds another layer: catalogs, schemas, and tables are permissioned independently, and a service principal with insufficient Unity Catalog grants returns 403s that are indistinguishable from missing warehouse access without careful error parsing.

Scalekit handles credential storage, per-tenant workspace routing, token refresh, and account status tracking. Your agent names a tool and passes parameters. The plumbing is not your problem.

Capabilities

What your agent can do with Databricks Workspace

Once connected, your agent has 26 pre-built tools covering SQL execution, Unity Catalog discovery, cluster and job management, and workspace administration:

Execute and manage SQL statements: run queries against any SQL warehouse, poll for results, fetch paginated result chunks, and cancel running statements
Discover schema and structure: list Unity Catalog catalogs, schemas, and tables; introspect column definitions, table constraints, and join keys via INFORMATION_SCHEMA
Manage SQL warehouses: list, start, stop, and inspect warehouses — useful for ensuring a warehouse is running before executing a query
Operate clusters: list, get, start, and terminate interactive clusters
Trigger and monitor jobs: list jobs and runs, get job details, and trigger immediate job runs
Inspect workspace users and secrets: list workspace users via SCIM, retrieve the authenticated service principal's identity, and list secret scopes

Jump to full tool reference

Setup context

What we're building

This guide connects a data assistant agent to Databricks Workspace — helping analysts run SQL, explore Unity Catalog schemas, and monitor jobs without leaving your product.

🤖

Example agent

Data assistant running SQL queries, exploring schemas, and triggering jobs on behalf of each analyst

🔐

Auth model

Service Principal OAuth 2.0 — each customer registers a service principal in their Databricks workspace. identifier = your user ID

⚙️

Scalekit account

app.scalekit.com — Client ID, Secret, Env URL

🔑

Databricks service principal

Each customer registers a service principal in their workspace and grants it SQL warehouse and Unity Catalog access

Setup

¹ Setup: One SDK, One credential

Install the Scalekit SDK. The only credential your application manages is the Scalekit API key — no Databricks secrets, no service principal tokens, nothing belonging to your customers.

pip install scalekit-sdk-python

npm install @scalekit-sdk/node

import scalekit.client import os from dotenv import load_dotenv load_dotenv() scalekit = scalekit.client.ScalekitClient( client_id=os.getenv("SCALEKIT_CLIENT_ID"), client_secret=os.getenv("SCALEKIT_CLIENT_SECRET"), env_url=os.getenv("SCALEKIT_ENV_URL"), ) actions = scalekit.actions

import { ScalekitClient } from '@scalekit-sdk/node'; import 'dotenv/config'; const scalekit = new ScalekitClient( process.env.SCALEKIT_ENV_URL, process.env.SCALEKIT_CLIENT_ID, process.env.SCALEKIT_CLIENT_SECRET ); const actions = scalekit.actions;

Already have credentials?

Skip to code examples Get a free account

Connected Accounts

² Per-User Auth: Creating connected accounts

Databricks uses Service Principal OAuth 2.0 — there is no user-facing OAuth redirect. Each customer's connected account is provisioned with their workspace URL and service principal credentials. The _identifier is any unique string from your system.

In the Scalekit dashboard, go to Agent Auth → Connected Accounts for your Databricks Workspace connection and click Add account. Supply the user's identifier, their Databricks workspace URL, and the service principal's client ID and secret. Scalekit stores credentials encrypted and routes every tool call to the correct workspace.

response = actions.get_or_create_connected_account( connection_name="databricksworkspace", identifier="user_dbx_123" # your internal user ID ) connected_account = response.connected_account print(f"Status: {connected_account.status}") # Status: ACTIVE — credentials were supplied at account creation

const response = await actions.getOrCreateConnectedAccount({ connectionName: "databricksworkspace", identifier: "user_dbx_123" // your internal user ID }); const connectedAccount = response.connectedAccount; console.log(`Status: ${connectedAccount.status}`); // Status: ACTIVE — credentials were supplied at account creation

This call is idempotent — safe to call on every session start. Returns the existing account if one already exists.

Credential handling

³ Credential management

Because Databricks uses Service Principal credentials rather than user OAuth, there is no user-facing authorization step. Credentials are entered once in the Scalekit dashboard (or via API) per connected account.

Credential storage is automatic

Once provisioned, Scalekit stores credentials in its encrypted vault and the account is immediately ACTIVE. Every tool call is routed to the correct Databricks workspace using the stored service principal credentials — your agent code never touches them. If credentials become invalid (e.g. the customer rotates or deletes the service principal), the account moves to REVOKED. Check account.status before critical operations and surface a re-credentialing prompt.

Bring Your Own Credentials

Each customer must register a service principal in their Databricks workspace and grant it the appropriate permissions — SQL warehouse access, Unity Catalog privileges, and any cluster or job permissions your agent requires. Supply the workspace URL, client ID, and client secret in the Scalekit dashboard under your Databricks Workspace connection.

See BYOC docs

Calling Databricks Workspace

⁴ Calling Databricks Workspace: What your agent writes

With the connected account active, your agent calls Databricks actions using _{execute_tool()}. Name the tool, pass parameters. Scalekit handles credential retrieval, workspace routing, and response parsing.

Execute a SQL statement

Run any SQL against a warehouse. Pass _catalog and _schema to set the default context. Results are returned inline for small result sets; use _{databricksworkspace_sql_statement_result_chunk_get} for paginated large results.

result = actions.execute_tool( identifier="user_dbx_123", tool_name="databricksworkspace_sql_statement_execute", tool_input={ "statement": "SELECT order_id, customer_id, total_amount FROM orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 50", "warehouse_id": "abc123def456", "catalog": "main", "schema": "sales" } ) # Returns: column schema + rows, or a statement_id for async polling

const result = await actions.executeTool({ identifier: "user_dbx_123", toolName: "databricksworkspace_sql_statement_execute", toolInput: { "statement": "SELECT order_id, customer_id, total_amount FROM orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 50", "warehouse_id": "abc123def456", "catalog": "main", "schema": "sales" } }); // Returns: column schema + rows, or a statement_id for async polling

Discover Unity Catalog schema

List all tables in a schema before running queries. Useful for letting the agent reason about available data before constructing SQL.

# List catalogs catalogs = actions.execute_tool( identifier="user_dbx_123", tool_name="databricksworkspace_unity_catalog_catalogs_list", tool_input={} ) # List tables in a schema tables = actions.execute_tool( identifier="user_dbx_123", tool_name="databricksworkspace_unity_catalog_tables_list", tool_input={ "catalog_name": "main", "schema_name": "sales" } ) # Returns: table names, types (MANAGED, EXTERNAL, VIEW), and comments

// List catalogs const catalogs = await actions.executeTool({ identifier: "user_dbx_123", toolName: "databricksworkspace_unity_catalog_catalogs_list", toolInput: {} }); // List tables in a schema const tables = await actions.executeTool({ identifier: "user_dbx_123", toolName: "databricksworkspace_unity_catalog_tables_list", toolInput: { "catalog_name": "main", "schema_name": "sales" } }); // Returns: table names, types (MANAGED, EXTERNAL, VIEW), and comments

Introspect columns and constraints

Use INFORMATION_SCHEMA tools to get column types and primary/foreign key constraints — useful for the agent to auto-construct correct JOIN conditions.

# Get column definitions columns = actions.execute_tool( identifier="user_dbx_123", tool_name="databricksworkspace_information_schema_columns", tool_input={ "catalog": "main", "schema": "sales", "table": "orders", "warehouse_id": "abc123def456" } ) # Returns: column name, data type, nullability, numeric precision, max char length # Get PK/FK constraints constraints = actions.execute_tool( identifier="user_dbx_123", tool_name="databricksworkspace_information_schema_table_constraints", tool_input={ "catalog": "main", "schema": "sales", "warehouse_id": "abc123def456" } ) # Returns: constraint name, type (PRIMARY KEY / FOREIGN KEY), column and referenced table

// Get column definitions const columns = await actions.executeTool({ identifier: "user_dbx_123", toolName: "databricksworkspace_information_schema_columns", toolInput: { "catalog": "main", "schema": "sales", "table": "orders", "warehouse_id": "abc123def456" } }); // Returns: column name, data type, nullability, numeric precision, max char length // Get PK/FK constraints const constraints = await actions.executeTool({ identifier: "user_dbx_123", toolName: "databricksworkspace_information_schema_table_constraints", toolInput: { "catalog": "main", "schema": "sales", "warehouse_id": "abc123def456" } }); // Returns: constraint name, type (PRIMARY KEY / FOREIGN KEY), column and referenced table

Trigger a job run

Trigger an immediate job run by job ID. Useful for agents that orchestrate data pipeline execution after completing their analysis step.

result = actions.execute_tool( identifier="user_dbx_123", tool_name="databricksworkspace_job_run_now", tool_input={ "job_id": 987654 } ) # Returns: run_id for the triggered job run

const result = await actions.executeTool({ identifier: "user_dbx_123", toolName: "databricksworkspace_job_run_now", toolInput: { "job_id": 987654 } }); // Returns: run_id for the triggered job run

Framework wiring

⁵ Wiring into your agent framework

Scalekit integrates directly with LangChain. The agent decides what to query or execute; Scalekit handles credential routing on every invocation. No credential plumbing in your agent logic.

from langchain_anthropic import ChatAnthropic from langchain.agents import AgentExecutor, create_tool_calling_agent from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder from scalekit.langchain import get_tools dbx_tools = get_tools( connection_name="databricksworkspace", identifier="user_dbx_123" ) prompt = ChatPromptTemplate.from_messages([ ("system", "You are a data assistant. Use the available tools to help explore Databricks schemas, run SQL queries, and manage jobs."), MessagesPlaceholder("chat_history", optional=True), ("human", "{input}"), MessagesPlaceholder("agent_scratchpad"), ]) agent = create_tool_calling_agent(ChatAnthropic(model="claude-sonnet-4-6"), dbx_tools, prompt) result = AgentExecutor(agent=agent, tools=dbx_tools).invoke({ "input": "Show me the top 10 customers by total order value from the sales schema this quarter" })

import { ChatAnthropic } from "@langchain/anthropic"; import { AgentExecutor, createToolCallingAgent } from "langchain/agents"; import { ChatPromptTemplate, MessagesPlaceholder } from "@langchain/core/prompts"; import { getTools } from "@scalekit-sdk/langchain"; const dbxTools = getTools({ connectionName: "databricksworkspace", identifier: "user_dbx_123" }); const prompt = ChatPromptTemplate.fromMessages([ ["system", "You are a data assistant. Use the available tools to help explore Databricks schemas, run SQL queries, and manage jobs."], new MessagesPlaceholder("chat_history", true), ["human", "{input}"], new MessagesPlaceholder("agent_scratchpad"), ]); const agent = await createToolCallingAgent({ llm: new ChatAnthropic({ model: "claude-sonnet-4-6" }), tools: dbxTools, prompt }); const result = await AgentExecutor.fromAgentAndTools({ agent, tools: dbxTools }).invoke({ input: "Show me the top 10 customers by total order value from the sales schema this quarter" });

Other frameworks supported

Google ADK Agent Auth quickstart

Tool reference

All 26 Databricks Workspace tools

Grouped by capability. Your agent calls tools by name — no API wrappers to write.

SQL Statements

databricksworkspace_sql_statement_execute

Execute a SQL statement on a Databricks SQL warehouse. Supports optional catalog and schema context. Returns results inline or a statement_id for async polling on large result sets.

databricksworkspace_sql_statement_get

Get the status and results of a previously executed SQL statement by statement ID

databricksworkspace_sql_statement_result_chunk_get

Fetch a specific result chunk for a paginated SQL statement result. Use when a statement result has multiple chunks (large result sets).

databricksworkspace_sql_statement_cancel

Cancel a running SQL statement by its statement ID

SQL Warehouses

databricksworkspace_sql_warehouses_list

List all SQL warehouses available in the Databricks workspace

databricksworkspace_sql_warehouse_get

Get details of a specific SQL warehouse by ID including state, size, and cluster config

databricksworkspace_sql_warehouse_start

Start a stopped SQL warehouse by ID before executing queries

databricksworkspace_sql_warehouse_stop

Stop a running SQL warehouse by ID to control compute costs

Unity Catalog

databricksworkspace_unity_catalog_catalogs_list

List all Unity Catalogs accessible to the service principal in the workspace

databricksworkspace_unity_catalog_schemas_list

List all schemas within a Unity Catalog by catalog name

databricksworkspace_unity_catalog_tables_list

List all tables and views within a schema. Returns table name, type (MANAGED, EXTERNAL, VIEW), and comments.

INFORMATION_SCHEMA

databricksworkspace_information_schema_schemata

List all schemas within a catalog using INFORMATION_SCHEMA.SCHEMATA. Used for schema discovery during setup.

databricksworkspace_information_schema_tables

List tables and views in a schema using INFORMATION_SCHEMA.TABLES. Returns table name, type, and comment.

databricksworkspace_information_schema_columns

List columns for a table using INFORMATION_SCHEMA.COLUMNS. Returns column name, data type, nullability, numeric precision/scale, and comment.

databricksworkspace_information_schema_table_constraints

List PRIMARY KEY and FOREIGN KEY constraints for tables in a schema. Used to auto-detect join keys.

Clusters

databricksworkspace_clusters_list

List all clusters in the Databricks workspace

databricksworkspace_cluster_get

Get details of a specific cluster by cluster ID including state, driver, and worker config

databricksworkspace_cluster_start

Start a terminated cluster by cluster ID

databricksworkspace_cluster_terminate

Terminate a cluster by cluster ID. Releases all associated resources.

Jobs

databricksworkspace_jobs_list

List all jobs in the workspace. Supports limit and offset for pagination.

databricksworkspace_job_get

Get details of a specific job by job ID including tasks, schedule, and cluster config

databricksworkspace_job_run_now

Trigger an immediate run of a job by job ID. Returns the run_id for status tracking.

databricksworkspace_job_runs_list

List job runs, optionally filtered by job ID. Supports limit and offset pagination.

Users & Workspace

databricksworkspace_scim_me_get

Retrieve information about the currently authenticated service principal in the workspace

databricksworkspace_scim_users_list

List all users in the workspace using the SCIM v2 API. Supports filter, count, and startIndex for pagination.

databricksworkspace_secrets_scopes_list

List all secret scopes available in the Databricks workspace

Connector notes

Databricks Workspace-specific behavior

SQL warehouses must be running before executing queries

Calling databricksworkspace_sql_statement_execute against a stopped warehouse returns an error that resembles an auth failure. Use databricksworkspace_sql_warehouse_get to check state first, and databricksworkspace_sql_warehouse_start to start it if needed. Auto-stop is configured per warehouse in the Databricks workspace settings.

Unity Catalog permissions are separate from workspace access

A service principal with valid OAuth credentials can still receive 403 errors on Unity Catalog operations if it has not been granted USE CATALOG, USE SCHEMA, or SELECT on the relevant objects. These grants must be configured by the customer's Databricks admin before onboarding. Catalog-level 403s are distinct from warehouse access errors — check both when debugging unexpected failures.

Large SQL results are paginated via chunks

When a SQL statement produces a result set larger than a single inline response, Databricks returns a statement_id with multiple chunk indices. Use databricksworkspace_sql_statement_result_chunk_get with the statement_id and a 0-based chunk_index to retrieve each page. The total chunk count is included in the initial statement result metadata.

Infrastructure decision

Why not build this yourself

The Databricks REST API is documented. Credential storage isn't technically hard. But here's what you're actually signing up for:

PROBLEM 01

Per-customer workspace routing — every API call must target the correct workspace host URL using the correct service principal credentials, with no shared base URL across tenants

PROBLEM 02

Service Principal OAuth token refresh — M2M tokens expire and must be refreshed before expiry; a failed refresh silently breaks all SQL execution for that tenant

PROBLEM 03

Per-user credential isolation — one customer's service principal credentials must never route calls to another customer's workspace, even within the same organization

PROBLEM 04

Encrypted credential storage, account status tracking, and a UI for collecting workspace URL and service principal credentials at onboarding — all before writing a single SQL tool call

That's one connector. Your agent product will eventually need Salesforce, Gmail, Snowflake, GitHub, and whatever else your customers ask for. Each has its own auth quirks and failure modes.

Scalekit maintains every connector. You maintain none of them.

Ready to ship

Query Databricks in minutes

Free to start. Service Principal OAuth fully managed. Per-tenant workspace routing handled.

Start building free Databricks Workspace + Agent Docs

Databricks Workspace

Live

Data Analytics

Automation

Status

Live

Tools

26 pre-built

Auth

Service Principal OAuth 2.0

Get started free

On this page

This is some text inside of a div block.

Quick links

Bring your own auth

Agent Auth quickstart

Agent Auth Pricing

MCP Auth

Databricks Workspace Integration for AI Agents

Why this is harder than it looks

What your agent can do with Databricks Workspace

What we're building

1 Setup: One SDK, One credential

2 Per-User Auth: Creating connected accounts

3 Credential management

4 Calling Databricks Workspace: What your agent writes

Execute a SQL statement

Discover Unity Catalog schema

Introspect columns and constraints

Trigger a job run

5 Wiring into your agent framework

All 26 Databricks Workspace tools

Databricks Workspace-specific behavior

Why not build this yourself

Query Databricks in minutes

¹ Setup: One SDK, One credential

² Per-User Auth: Creating connected accounts

³ Credential management

⁴ Calling Databricks Workspace: What your agent writes

⁵ Wiring into your agent framework