Many applications use CockroachDB (CDB) as their datastore. Here are a list of utilities that have been built to support working with CDB.
For desktop testing you can install CDB and run a local instance in single-node mode. Download from:
Make sure to download the version that corresponds to the version used in production, which can
be checked by looking at the YAML file in the
k8s-config
repo(This is v22.1 as of Apr 2024).
cd /tmp; cockroach start-single-node --insecure --listen-addr=127.0.0.1
Note that this will start and print out useful information, such as the location
of the admin dashboard (webui), and also how to connect to the running instance
using the cockroach
command line application (RPC client flags):
CockroachDB node starting at 2023-10-04 15:10:35.315566725 +0000 UTC m=+1.911432965 (took 1.6s)
build: CCL v22.2.3 @ 2023/01/23 19:11:57 (go1.19.1)
webui: http://127.0.0.1:8080
sql: postgresql://root@127.0.0.1:26257/defaultdb?sslmode=disable
sql (JDBC): jdbc:postgresql://127.0.0.1:26257/defaultdb?sslmode=disable&user=root
RPC client flags: cockroach <client cmd> --host=127.0.0.1:26257 --insecure
logs: /tmp/cockroach-data/logs
temp dir: /tmp/cockroach-data/cockroach-temp2650087369
external I/O path: /tmp/cockroach-data/extern
store[0]: path=/tmp/cockroach-data
storage engine: pebble
clusterID: dbef3f5f-ef2c-48ef-a39b-5cdbeb219e70
status: initialized new cluster
nodeID: 1
Defining the schema for the database is done in Go code. By doing this, we have the source of truth as a documented Go struct, which can be used in a more flexible way than having the SQL as the source of truth. For example see:
https://skia.googlesource.com/buildbot/+/refs/heads/main/perf/go/sql/tables.go
package sql
//go:generate bazelisk run --config=mayberemote //:go -- run ./tosql
...
// Tables represents the full schema of the SQL database.
type Tables struct {
Alerts []alertschema.AlertSchema
Commits []gitschema.Commit
ParamSets []traceschema.ParamSetsSchema
Postings []traceschema.PostingsSchema
Regressions []regressionschema.RegressionSchema
Shortcuts []shortcutschema.ShortcutSchema
SourceFiles []traceschema.SourceFilesSchema
TraceValues []traceschema.TraceValuesSchema
}
Each SQL table is defined by a public member of the Tables
struct.
Let's look at the gitschema.Commit
type:
// Commit represents a single commit stored in the database.
//
// JSON annotations make it serialize like the legacy cid.CommitDetail.
type Commit struct {
CommitNumber types.CommitNumber `sql:"commit_number INT PRIMARY KEY"`
GitHash string `sql:"git_hash TEXT UNIQUE NOT NULL"`
Timestamp int64 `sql:"commit_time INT"` // Unix timestamp, seconds from the epoch.
Author string `sql:"author TEXT"`
Subject string `sql:"subject TEXT"`
}
Note that struct tags that define how each element will be stored as SQL in the
database. For example, CommitNumber
will be stored as an INT
and also used as
a PRIMARY KEY
.
If you need to add other information to the schema, such as secondary indexes,
add them as struct tags on private members of the struct. In TraceValuesSchema
below you can see how a secondary index was added in the bySourceFileIndex
struct tag:
// TraceValuesSchema describes the SQL schema of the TraceValues table.
type TraceValuesSchema struct {
TraceID []byte `sql:"trace_id BYTES"`
CommitNumber int64 `sql:"commit_number INT"`
Value float32 `sql:"val REAL"`
SourceFileID int64 `sql:"source_file_id INT"`
primaryKey struct{} `sql:"PRIMARY KEY (trace_id, commit_number)"`
bySourceFileIndex struct{} `sql:"INDEX by_source_file_id (source_file_id, trace_id)"`
}
Looking back at the tables.go
file:
package sql
//go:generate bazelisk run --config=mayberemote //:go -- run ./tosql
...
// Tables represents the full schema of the SQL database.
type Tables struct {
Alerts []alertschema.AlertSchema
Commits []gitschema.Commit
ParamSets []traceschema.ParamSetsSchema
Postings []traceschema.PostingsSchema
Regressions []regressionschema.RegressionSchema
Shortcuts []shortcutschema.ShortcutSchema
SourceFiles []traceschema.SourceFilesSchema
TraceValues []traceschema.TraceValuesSchema
}
You can see to go:generate
command at the top of the file. This generates a
new Go file that contains the SQL schema as a Go string, and also provides the
column names of each table:
package sql
// Generated by //go/sql/exporter/
// DO NOT EDIT
const Schema = `CREATE TABLE IF NOT EXISTS Alerts (
id INT PRIMARY KEY DEFAULT unique_rowid(),
alert TEXT,
config_state INT DEFAULT 0,
last_modified INT
);
...
CREATE TABLE IF NOT EXISTS TraceValues (
trace_id BYTES,
commit_number INT,
val REAL,
source_file_id INT,
PRIMARY KEY (trace_id, commit_number),
INDEX by_source_file_id (source_file_id, trace_id)
);
`
var Alerts = []string{
"id",
"alert",
"config_state",
"last_modified",
}
...
var TraceValues = []string{
"trace_id",
"commit_number",
"val",
"source_file_id",
}
The full file can be found at:
https://skia.googlesource.com/buildbot/+/refs/heads/main/perf/go/sql/schema.go
The above section shows how we can generate the SQL commands to create all the tables in the database. Another aspect we need to care about is having the application check the schema on startup to make sure it has all the tables and columns the current version of the application expects.
To do that we need to export information about the tables, in this case the column names, their types, and any extra indexes.
https://pkg.go.dev/go.skia.org/infra/go/sql/schema#Description
The description can be exported to a JSON file:
https://skia.googlesource.com/buildbot/+/refs/heads/main/perf/go/sql/exportschema/main.go
Then the JSON file that describes the table structure can be then be embedded into the application:
// Package expectedschema contains the schema the database is expected to have.
package expectedschema
import (
"embed" // Enable go:embed.
"encoding/json"
"go.skia.org/infra/go/skerr"
"go.skia.org/infra/go/sql/schema"
)
// FS is a filesystem with the schema.json file.
//
//go:embed schema.json
var FS embed.FS
// Load returns the deserialized schema.Description stored in the schema.json file.
func Load() (schema.Description, error) {
var ret schema.Description
b, err := FS.ReadFile("schema.json")
if err != nil {
return ret, skerr.Wrap(err)
}
err = json.Unmarshal(b, &ret)
if err != nil {
return ret, skerr.Wrap(err)
}
return ret, nil
}
And finally that schema can be compared to the current schema running in production:
// Confirm the database has the right schema.
expectedSchema, err := expectedschema.Load()
if err != nil {
return nil, skerr.Wrap(err)
}
actual, err := schema.GetDescription(singletonPool, sql.Tables{})
if err != nil {
return nil, skerr.Wrap(err)
}
if diff := assertdeep.Diff(expectedSchema, *actual); diff != "" {
return nil, skerr.Fmt("Schema needs to be updated: %s.", diff)
}
https://skia.googlesource.com/buildbot/+/refs/heads/main/perf/go/builders/builders.go#93
This system does not offer any automatic generation of schema migrations, which would be incomplete and error prone, so migrations need to be manually written, but the above schema description and comparison functions can be used to test a schema migration.
For an example of how to do this see Test_LiveToNextSchemaMigration
in
The test below loads the previous version of the schema, applies the migrations, and then compares the resulting schema to the current schema:
func Test_LiveToNextSchemaMigration(t *testing.T) {
ctx := context.Background()
db := cdbtest.NewCockroachDBForTests(t, "desc")
expectedSchema := GetSchema(t, db)
_, err := db.Exec(ctx, "DROP TABLE IF EXISTS Description")
require.NoError(t, err)
_, err = db.Exec(ctx, "DROP TABLE IF EXISTS TaskResult")
require.NoError(t, err)
_, err = db.Exec(ctx, LiveSchema)
require.NoError(t, err)
_, err = db.Exec(ctx, FromLiveToNext)
require.NoError(t, err)
migratedSchema := GetSchema(t, db)
assertdeep.Equal(t, expectedSchema, migratedSchema)
// Test the test, make sure at least one known column is present.
require.Equal(t, "text def: nullable:NO", migratedSchema.ColumnNameAndType["description.machine_id"])
}
The sqlutil package has utilities that work with the generated column names to make writing some SQL queries easier.
For example, for a table with a lot of columns the insert statement may be
easier to write. In the following code presume that Description
contains
the column names of the Description table:
queryTemplate := fmt.Sprintf(`INSERT INTO Description (%s) VALUES %s`,
strings.Join(Description, ","), sqlutil.ValuesPlaceholders(len(Description), 1),