The real problem
Why this is harder than it looks
The Databricks REST API is clean and thoroughly documented. You can prototype SQL execution against your own workspace in an afternoon. The complexity arrives when you try to do this for real users in a multi-tenant product — and the authentication model is different from most SaaS connectors.
Databricks uses Service Principal OAuth 2.0. Unlike user-delegated OAuth flows, the agent authenticates as a service principal registered in the customer's Databricks workspace. This means each of your customers must register a service principal in their own workspace, grant it the right permissions, and supply a client ID and secret to your product. There is no OAuth redirect flow to standardize onboarding — you must build the credential collection UI, store those credentials encrypted, and associate each set with the right workspace URL. Every customer operates on a different workspace host (e.g. adb-1234567890.azuredatabricks.net), so every API call must route to the correct base URL using the credentials for that tenant.
Beyond setup, the operational failure modes are non-obvious. Service principal tokens expire and must be refreshed using the M2M OAuth flow — if your refresh logic fails, all SQL calls for that tenant silently return 401s. SQL warehouses can be stopped; calling databricksworkspace_sql_statement_execute against a stopped warehouse returns an error that looks identical to an auth failure. The Unity Catalog adds another layer: catalogs, schemas, and tables are permissioned independently, and a service principal with insufficient Unity Catalog grants returns 403s that are indistinguishable from missing warehouse access without careful error parsing.
Scalekit handles credential storage, per-tenant workspace routing, token refresh, and account status tracking. Your agent names a tool and passes parameters. The plumbing is not your problem.
Capabilities
What your agent can do with Databricks Workspace
Once connected, your agent has 26 pre-built tools covering SQL execution, Unity Catalog discovery, cluster and job management, and workspace administration:
- Execute and manage SQL statements: run queries against any SQL warehouse, poll for results, fetch paginated result chunks, and cancel running statements
- Discover schema and structure: list Unity Catalog catalogs, schemas, and tables; introspect column definitions, table constraints, and join keys via INFORMATION_SCHEMA
- Manage SQL warehouses: list, start, stop, and inspect warehouses — useful for ensuring a warehouse is running before executing a query
- Operate clusters: list, get, start, and terminate interactive clusters
- Trigger and monitor jobs: list jobs and runs, get job details, and trigger immediate job runs
- Inspect workspace users and secrets: list workspace users via SCIM, retrieve the authenticated service principal's identity, and list secret scopes
Setup context
What we're building
This guide connects a data assistant agent to Databricks Workspace — helping analysts run SQL, explore Unity Catalog schemas, and monitor jobs without leaving your product.
🤖
Example agent
Data assistant running SQL queries, exploring schemas, and triggering jobs on behalf of each analyst
🔐
Auth model
Service Principal OAuth 2.0 — each customer registers a service principal in their Databricks workspace. identifier = your user ID
🔑
Databricks service principal
Each customer registers a service principal in their workspace and grants it SQL warehouse and Unity Catalog access
Setup
1 Setup: One SDK, One credential
Install the Scalekit SDK. The only credential your application manages is the Scalekit API key — no Databricks secrets, no service principal tokens, nothing belonging to your customers.
pip install scalekit-sdk-python
npm install @scalekit-sdk/node
import scalekit.client
import os
from dotenv import load_dotenv
load_dotenv()
scalekit = scalekit.client.ScalekitClient(
client_id=os.getenv("SCALEKIT_CLIENT_ID"),
client_secret=os.getenv("SCALEKIT_CLIENT_SECRET"),
env_url=os.getenv("SCALEKIT_ENV_URL"),
)
actions = scalekit.actions
import { ScalekitClient } from '@scalekit-sdk/node';
import 'dotenv/config';
const scalekit = new ScalekitClient(
process.env.SCALEKIT_ENV_URL,
process.env.SCALEKIT_CLIENT_ID,
process.env.SCALEKIT_CLIENT_SECRET
);
const actions = scalekit.actions;
Already have credentials?
Connected Accounts
2 Per-User Auth: Creating connected accounts
Databricks uses Service Principal OAuth 2.0 — there is no user-facing OAuth redirect. Each customer's connected account is provisioned with their workspace URL and service principal credentials. The identifier is any unique string from your system.
In the Scalekit dashboard, go to Agent Auth → Connected Accounts for your Databricks Workspace connection and click Add account. Supply the user's identifier, their Databricks workspace URL, and the service principal's client ID and secret. Scalekit stores credentials encrypted and routes every tool call to the correct workspace.
response = actions.get_or_create_connected_account(
connection_name="databricksworkspace",
identifier="user_dbx_123" # your internal user ID
)
connected_account = response.connected_account
print(f"Status: {connected_account.status}")
# Status: ACTIVE — credentials were supplied at account creation
const response = await actions.getOrCreateConnectedAccount({
connectionName: "databricksworkspace",
identifier: "user_dbx_123" // your internal user ID
});
const connectedAccount = response.connectedAccount;
console.log(`Status: ${connectedAccount.status}`);
// Status: ACTIVE — credentials were supplied at account creation
This call is idempotent — safe to call on every session start. Returns the existing account if one already exists.
Credential handling
3 Credential management
Because Databricks uses Service Principal credentials rather than user OAuth, there is no user-facing authorization step. Credentials are entered once in the Scalekit dashboard (or via API) per connected account.
Credential storage is automatic
Once provisioned, Scalekit stores credentials in its encrypted vault and the account is immediately ACTIVE.
Every tool call is routed to the correct Databricks workspace using the stored service principal credentials —
your agent code never touches them. If credentials become invalid (e.g. the customer rotates or deletes the
service principal), the account moves to REVOKED. Check account.status before critical operations and
surface a re-credentialing prompt.
Bring Your Own Credentials
Each customer must register a service principal in their Databricks workspace and grant it the appropriate
permissions — SQL warehouse access, Unity Catalog privileges, and any cluster or job permissions your agent
requires. Supply the workspace URL, client ID, and client secret in the Scalekit dashboard under your
Databricks Workspace connection.
Calling Databricks Workspace
4 Calling Databricks Workspace: What your agent writes
With the connected account active, your agent calls Databricks actions using execute_tool(). Name the tool, pass parameters. Scalekit handles credential retrieval, workspace routing, and response parsing.
Execute a SQL statement
Run any SQL against a warehouse. Pass catalog and schema to set the default context. Results are returned inline for small result sets; use databricksworkspace_sql_statement_result_chunk_get for paginated large results.
result = actions.execute_tool(
identifier="user_dbx_123",
tool_name="databricksworkspace_sql_statement_execute",
tool_input={
"statement": "SELECT order_id, customer_id, total_amount FROM orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 50",
"warehouse_id": "abc123def456",
"catalog": "main",
"schema": "sales"
}
)
# Returns: column schema + rows, or a statement_id for async polling
const result = await actions.executeTool({
identifier: "user_dbx_123",
toolName: "databricksworkspace_sql_statement_execute",
toolInput: {
"statement": "SELECT order_id, customer_id, total_amount FROM orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 50",
"warehouse_id": "abc123def456",
"catalog": "main",
"schema": "sales"
}
});
// Returns: column schema + rows, or a statement_id for async polling
Discover Unity Catalog schema
List all tables in a schema before running queries. Useful for letting the agent reason about available data before constructing SQL.
# List catalogs
catalogs = actions.execute_tool(
identifier="user_dbx_123",
tool_name="databricksworkspace_unity_catalog_catalogs_list",
tool_input={}
)
# List tables in a schema
tables = actions.execute_tool(
identifier="user_dbx_123",
tool_name="databricksworkspace_unity_catalog_tables_list",
tool_input={
"catalog_name": "main",
"schema_name": "sales"
}
)
# Returns: table names, types (MANAGED, EXTERNAL, VIEW), and comments
// List catalogs
const catalogs = await actions.executeTool({
identifier: "user_dbx_123",
toolName: "databricksworkspace_unity_catalog_catalogs_list",
toolInput: {}
});
// List tables in a schema
const tables = await actions.executeTool({
identifier: "user_dbx_123",
toolName: "databricksworkspace_unity_catalog_tables_list",
toolInput: {
"catalog_name": "main",
"schema_name": "sales"
}
});
// Returns: table names, types (MANAGED, EXTERNAL, VIEW), and comments
Introspect columns and constraints
Use INFORMATION_SCHEMA tools to get column types and primary/foreign key constraints — useful for the agent to auto-construct correct JOIN conditions.
# Get column definitions
columns = actions.execute_tool(
identifier="user_dbx_123",
tool_name="databricksworkspace_information_schema_columns",
tool_input={
"catalog": "main",
"schema": "sales",
"table": "orders",
"warehouse_id": "abc123def456"
}
)
# Returns: column name, data type, nullability, numeric precision, max char length
# Get PK/FK constraints
constraints = actions.execute_tool(
identifier="user_dbx_123",
tool_name="databricksworkspace_information_schema_table_constraints",
tool_input={
"catalog": "main",
"schema": "sales",
"warehouse_id": "abc123def456"
}
)
# Returns: constraint name, type (PRIMARY KEY / FOREIGN KEY), column and referenced table
// Get column definitions
const columns = await actions.executeTool({
identifier: "user_dbx_123",
toolName: "databricksworkspace_information_schema_columns",
toolInput: {
"catalog": "main",
"schema": "sales",
"table": "orders",
"warehouse_id": "abc123def456"
}
});
// Returns: column name, data type, nullability, numeric precision, max char length
// Get PK/FK constraints
const constraints = await actions.executeTool({
identifier: "user_dbx_123",
toolName: "databricksworkspace_information_schema_table_constraints",
toolInput: {
"catalog": "main",
"schema": "sales",
"warehouse_id": "abc123def456"
}
});
// Returns: constraint name, type (PRIMARY KEY / FOREIGN KEY), column and referenced table
Trigger a job run
Trigger an immediate job run by job ID. Useful for agents that orchestrate data pipeline execution after completing their analysis step.
result = actions.execute_tool(
identifier="user_dbx_123",
tool_name="databricksworkspace_job_run_now",
tool_input={
"job_id": 987654
}
)
# Returns: run_id for the triggered job run
const result = await actions.executeTool({
identifier: "user_dbx_123",
toolName: "databricksworkspace_job_run_now",
toolInput: {
"job_id": 987654
}
});
// Returns: run_id for the triggered job run
Framework wiring
5 Wiring into your agent framework
Scalekit integrates directly with LangChain. The agent decides what to query or execute; Scalekit handles credential routing on every invocation. No credential plumbing in your agent logic.
from langchain_anthropic import ChatAnthropic
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from scalekit.langchain import get_tools
dbx_tools = get_tools(
connection_name="databricksworkspace",
identifier="user_dbx_123"
)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a data assistant. Use the available tools to help explore Databricks schemas, run SQL queries, and manage jobs."),
MessagesPlaceholder("chat_history", optional=True),
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad"),
])
agent = create_tool_calling_agent(ChatAnthropic(model="claude-sonnet-4-6"), dbx_tools, prompt)
result = AgentExecutor(agent=agent, tools=dbx_tools).invoke({
"input": "Show me the top 10 customers by total order value from the sales schema this quarter"
})
import { ChatAnthropic } from "@langchain/anthropic";
import { AgentExecutor, createToolCallingAgent } from "langchain/agents";
import { ChatPromptTemplate, MessagesPlaceholder } from "@langchain/core/prompts";
import { getTools } from "@scalekit-sdk/langchain";
const dbxTools = getTools({
connectionName: "databricksworkspace",
identifier: "user_dbx_123"
});
const prompt = ChatPromptTemplate.fromMessages([
["system", "You are a data assistant. Use the available tools to help explore Databricks schemas, run SQL queries, and manage jobs."],
new MessagesPlaceholder("chat_history", true),
["human", "{input}"],
new MessagesPlaceholder("agent_scratchpad"),
]);
const agent = await createToolCallingAgent({
llm: new ChatAnthropic({ model: "claude-sonnet-4-6" }),
tools: dbxTools,
prompt
});
const result = await AgentExecutor.fromAgentAndTools({
agent,
tools: dbxTools
}).invoke({
input: "Show me the top 10 customers by total order value from the sales schema this quarter"
});
Other frameworks supported
Tool reference
All 26 Databricks Workspace tools
Grouped by capability. Your agent calls tools by name — no API wrappers to write.
databricksworkspace_sql_statement_execute
Execute a SQL statement on a Databricks SQL warehouse. Supports optional catalog and schema context. Returns results inline or a statement_id for async polling on large result sets.
databricksworkspace_sql_statement_get
Get the status and results of a previously executed SQL statement by statement ID
databricksworkspace_sql_statement_result_chunk_get
Fetch a specific result chunk for a paginated SQL statement result. Use when a statement result has multiple chunks (large result sets).
databricksworkspace_sql_statement_cancel
Cancel a running SQL statement by its statement ID
databricksworkspace_sql_warehouses_list
List all SQL warehouses available in the Databricks workspace
databricksworkspace_sql_warehouse_get
Get details of a specific SQL warehouse by ID including state, size, and cluster config
databricksworkspace_sql_warehouse_start
Start a stopped SQL warehouse by ID before executing queries
databricksworkspace_sql_warehouse_stop
Stop a running SQL warehouse by ID to control compute costs
databricksworkspace_unity_catalog_catalogs_list
List all Unity Catalogs accessible to the service principal in the workspace
databricksworkspace_unity_catalog_schemas_list
List all schemas within a Unity Catalog by catalog name
databricksworkspace_unity_catalog_tables_list
List all tables and views within a schema. Returns table name, type (MANAGED, EXTERNAL, VIEW), and comments.
databricksworkspace_information_schema_schemata
List all schemas within a catalog using INFORMATION_SCHEMA.SCHEMATA. Used for schema discovery during setup.
databricksworkspace_information_schema_tables
List tables and views in a schema using INFORMATION_SCHEMA.TABLES. Returns table name, type, and comment.
databricksworkspace_information_schema_columns
List columns for a table using INFORMATION_SCHEMA.COLUMNS. Returns column name, data type, nullability, numeric precision/scale, and comment.
databricksworkspace_information_schema_table_constraints
List PRIMARY KEY and FOREIGN KEY constraints for tables in a schema. Used to auto-detect join keys.
databricksworkspace_clusters_list
List all clusters in the Databricks workspace
databricksworkspace_cluster_get
Get details of a specific cluster by cluster ID including state, driver, and worker config
databricksworkspace_cluster_start
Start a terminated cluster by cluster ID
databricksworkspace_cluster_terminate
Terminate a cluster by cluster ID. Releases all associated resources.
databricksworkspace_jobs_list
List all jobs in the workspace. Supports limit and offset for pagination.
databricksworkspace_job_get
Get details of a specific job by job ID including tasks, schedule, and cluster config
databricksworkspace_job_run_now
Trigger an immediate run of a job by job ID. Returns the run_id for status tracking.
databricksworkspace_job_runs_list
List job runs, optionally filtered by job ID. Supports limit and offset pagination.
databricksworkspace_scim_me_get
Retrieve information about the currently authenticated service principal in the workspace
databricksworkspace_scim_users_list
List all users in the workspace using the SCIM v2 API. Supports filter, count, and startIndex for pagination.
databricksworkspace_secrets_scopes_list
List all secret scopes available in the Databricks workspace
Connector notes
Databricks Workspace-specific behavior
SQL warehouses must be running before executing queries
Calling databricksworkspace_sql_statement_execute against a stopped warehouse returns an error that
resembles an auth failure. Use databricksworkspace_sql_warehouse_get to check state first, and
databricksworkspace_sql_warehouse_start to start it if needed. Auto-stop is configured per warehouse
in the Databricks workspace settings.
Unity Catalog permissions are separate from workspace access
A service principal with valid OAuth credentials can still receive 403 errors on Unity Catalog operations
if it has not been granted USE CATALOG, USE SCHEMA, or SELECT on the relevant objects. These grants must
be configured by the customer's Databricks admin before onboarding. Catalog-level 403s are distinct from
warehouse access errors — check both when debugging unexpected failures.
Large SQL results are paginated via chunks
When a SQL statement produces a result set larger than a single inline response, Databricks returns a
statement_id with multiple chunk indices. Use databricksworkspace_sql_statement_result_chunk_get with
the statement_id and a 0-based chunk_index to retrieve each page. The total chunk count is included
in the initial statement result metadata.
Infrastructure decision
Why not build this yourself
The Databricks REST API is documented. Credential storage isn't technically hard. But here's what you're actually signing up for:
PROBLEM 01
Per-customer workspace routing — every API call must target the correct workspace host URL using the correct service principal credentials, with no shared base URL across tenants
PROBLEM 02
Service Principal OAuth token refresh — M2M tokens expire and must be refreshed before expiry; a failed refresh silently breaks all SQL execution for that tenant
PROBLEM 03
Per-user credential isolation — one customer's service principal credentials must never route calls to another customer's workspace, even within the same organization
PROBLEM 04
Encrypted credential storage, account status tracking, and a UI for collecting workspace URL and service principal credentials at onboarding — all before writing a single SQL tool call
That's one connector. Your agent product will eventually need Salesforce, Gmail, Snowflake, GitHub, and whatever else your customers ask for. Each has its own auth quirks and failure modes.
Scalekit maintains every connector. You maintain none of them.