Name: PlanetScale Database Skills — The Complete Guide
Brand: PlanetScale
Availability: InStock

PlanetScale Database Skills — The Complete Guide

A comprehensive reference for MySQL, PostgreSQL, Vitess, and Neki — compiled from PlanetScale's open-source database-skills collection.

Author: PlanetScale
Version: 1.0.0 (February 2026)
Source: github.com/planetscale/database-skills

MySQL / InnoDB
- 1.1 Schema Design
- 1.2 Primary Keys
- 1.3 Indexing & Composite Indexes
- 1.4 Query Optimization & EXPLAIN Analysis
- 1.5 Transactions & Isolation Levels
- 1.6 Operations
PostgreSQL
- 2.1 Schema Design
- 2.2 Indexing Best Practices
- 2.3 Operations & Architecture
- 2.4 PlanetScale-Specific Features
Vitess
- 3.1 Overview & Key Concepts
- 3.2 VSchema Design & Configuration
- 3.3 SQL Compatibility & Limitations
- 3.4 References (Schema Changes, VReplication, Architecture, Query Serving)
Neki — Sharded Postgres
- 4.1 Overview
- 4.2 Pre-Sharding Readiness Guide
Cross-Cutting Best Practices

Chapter 1: MySQL / InnoDB

Use this chapter to make safe, measurable MySQL/InnoDB changes.

Workflow:

Define workload and constraints (read/write mix, latency target, data volume, MySQL version, hosting platform).
Propose the smallest change that can solve the problem, including trade-offs.
Validate with evidence (EXPLAIN, EXPLAIN ANALYZE, lock/connection metrics, and production-safe rollout steps).
For production changes, include rollback and post-deploy verification.

1.1 Schema Design

Prefer narrow, monotonic PKs (BIGINT UNSIGNED AUTO_INCREMENT) for write-heavy OLTP tables.
Avoid random UUID values as clustered PKs; if external IDs are required, keep UUID in a secondary unique column.
Always utf8mb4 / utf8mb4_0900_ai_ci. Prefer NOT NULL, DATETIME over TIMESTAMP.
Lookup tables over ENUM. Normalize to 3NF; denormalize only for measured hot paths.

Data Types Quick Reference

Use	Prefer
Character set	`utf8mb4` always
Collation	`utf8mb4_0900_ai_ci`
Timestamps	`DATETIME` over `TIMESTAMP`
Nullability	`NOT NULL` by default
Enums	Lookup tables instead

1.2 Primary Keys

InnoDB stores rows in primary key order (clustered index). This means:

Sequential keys = optimal inserts: new rows append, minimizing page splits and fragmentation.
Random keys = fragmentation: random inserts cause page splits to maintain PK order.
Secondary index lookups: secondary indexes store the PK value and use it to fetch the full row.

INT vs BIGINT

INT UNSIGNED: 4 bytes, max ~4.3B rows.
BIGINT UNSIGNED: 8 bytes, max ~18.4 quintillion rows.

Guideline: Default to BIGINT UNSIGNED unless you're certain the table will never approach the INT limit.

Avoid Random UUID as Clustered PK

UUID PK stored as BINARY(16): 16 bytes (vs 8 for BIGINT). Random inserts cause page splits. If external identifiers are required, store UUID as BINARY(16) in a secondary unique column:

CREATE TABLE users (
  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  public_id BINARY(16) NOT NULL,
  UNIQUE KEY idx_public_id (public_id)
);
INSERT INTO users (public_id) VALUES (UUID_TO_BIN(?, 1));

Prefer time-ordered variants such as UUIDv7 to reduce index fragmentation.

Alternative Ordered IDs

Snowflake-style: 64-bit integers (fits in BIGINT), time-ordered, compact.
ULID / UUIDv7: 128-bit (store as BINARY(16)), time-ordered.

Composite Primary Keys

Use for join/many-to-many tables. Most-queried column first:

CREATE TABLE user_roles (
  user_id BIGINT UNSIGNED NOT NULL,
  role_id BIGINT UNSIGNED NOT NULL,
  PRIMARY KEY (user_id, role_id)
);

1.3 Indexing & Composite Indexes

Core Principles

Composite order: equality first, then range/sort (leftmost prefix rule).
Range predicates stop index usage for subsequent columns.
Secondary indexes include PK implicitly. Prefix indexes for long strings.
Audit via performance_schema — drop indexes with count_read = 0.

Leftmost Prefix Rule

Index (a, b, c) is usable for:

WHERE a ✓
WHERE a AND b ✓
WHERE a AND b AND c ✓
WHERE a AND c → uses only column a
WHERE b alone → NOT usable

Column Order: Equality First, Then Range/Sort

-- Query: WHERE tenant_id = ? AND status = ? AND created_at > ?
CREATE INDEX idx_orders ON orders (tenant_id, status, created_at);

Critical: Range predicates (>, <, BETWEEN, LIKE 'prefix%') stop index usage for filtering subsequent columns. However, columns after a range predicate can still be useful for covering index reads.

Sort Order Must Match Index

ORDER BY status ASC, created_at ASC   -- ✓ matches
ORDER BY status DESC, created_at DESC -- ✓ full reverse OK
ORDER BY status ASC, created_at DESC  -- ⚠️ mixed (may use filesort)

-- MySQL 8.0+: descending index components
CREATE INDEX idx ON orders (status ASC, created_at DESC);

Composite vs Multiple Single-Column Indexes

A composite index is typically faster than index merge. Index merge is useful when queries filter on different column combinations that don't share a common prefix.

Design for Multiple Queries

-- One index covers: WHERE user_id=?, WHERE user_id=? AND status=?,
--   and WHERE user_id=? AND status=? ORDER BY created_at DESC
CREATE INDEX idx ON orders (user_id, status, created_at DESC);

1.4 Query Optimization & EXPLAIN Analysis

EXPLAIN Commands

EXPLAIN SELECT ...;                    -- estimated plan
EXPLAIN FORMAT=JSON SELECT ...;        -- detailed with cost estimates
EXPLAIN FORMAT=TREE SELECT ...;        -- tree format (8.0+)
EXPLAIN ANALYZE SELECT ...;            -- actual execution (8.0.18+, runs the query)

Access Types (Best → Worst)

system → const → eq_ref → ref → range → index → ALL

Target ref or better. ALL on >1000 rows almost always needs an index.

Key Extra Flags

Flag	Meaning	Action
`Using index`	Covering index (optimal)	None
`Using filesort`	Sort not via index	Index the ORDER BY columns
`Using temporary`	Temp table for GROUP BY	Index the grouped columns
`Using join buffer`	No index on join column	Add index on join column
`Using index condition`	ICP — engine filters at index level	Generally good

key_len — How Much of Composite Index Is Used

Byte sizes: TINYINT=1, INT=4, BIGINT=8, DATE=3, DATETIME=5, VARCHAR(N) utf8mb4: N×4+1 (or +2 when N×4>255). Add 1 byte per nullable column.

rows vs filtered

rows: estimated rows examined after index access
filtered: percent expected to pass the full WHERE
Rough output: rows × filtered / 100

EXPLAIN ANALYZE (MySQL 8.0.18+)

Actually executes the query. Metrics: actual time (ms), rows (actual), loops (iterations). Compare estimated vs actual to find optimizer misestimates. Refresh with ANALYZE TABLE your_table;.

Query Optimization Tips

Check EXPLAIN — red flags: type: ALL, Using filesort, Using temporary.
Cursor pagination, not OFFSET. Avoid functions on indexed columns in WHERE.
Batch inserts (500–5000 rows). UNION ALL over UNION when dedup unnecessary.

1.5 Transactions & Isolation Levels

Default: REPEATABLE READ

The InnoDB default, most tested, prevents phantom reads. Only change per-session with a measured reason.

SELECT @@transaction_isolation;
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;  -- per-session only

Locking vs Non-Locking Reads

Non-locking reads: plain SELECT uses MVCC snapshots. No locks, don't block writers.
Locking reads: SELECT ... FOR UPDATE (exclusive) or SELECT ... FOR SHARE (shared).
UPDATE and DELETE are implicitly locking reads.

REPEATABLE READ (Prefer This)

Snapshot at first read; all plain SELECTs read from that snapshot.
Locking reads/writes use next-key locks (row + gap) — prevents phantoms.
Use for: OLTP, check-then-insert, financial logic, consistent reports.
Avoid mixing locking and non-locking statements in the same transaction.

READ COMMITTED (Per-Session Only)

Fresh snapshot per SELECT; record locks only (no gap locks) — more concurrency, phantoms possible.
Switch only when: gap-lock deadlocks confirmed, bulk imports with contention, or high-write concurrency on overlapping ranges.
Never switch globally. Check-then-insert patterns break.

Decision Guide

Scenario	Recommendation
General OLTP / check-then-insert / reports	REPEATABLE READ (default)
Bulk import or gap-lock deadlocks	READ COMMITTED (per-session)
Need serializability	Explicit `FOR UPDATE` at REPEATABLE READ

Deadlock Prevention

Consistent row access order prevents deadlocks. Retry error 1213 with backoff.
Do I/O outside transactions. Use SELECT ... FOR UPDATE sparingly.

1.6 Operations

Use online DDL (ALGORITHM=INPLACE) when possible; test on replicas first.
Tune connection pooling — avoid max_connections exhaustion under load.
Monitor replication lag; avoid stale reads from replicas during writes.

Partitioning

Partition time-series (>50M rows) or large tables (>100M rows). Plan early — retrofit = full rebuild.
Include partition column in every unique/PK. Always add a MAXVALUE catch-all.

Guardrails

Prefer measured evidence over blanket rules of thumb.
Note MySQL-version-specific behavior when giving advice.
Ask for explicit human approval before destructive data operations.

Chapter 2: PostgreSQL

2.1 Schema Design

Primary Keys

Prefer BIGINT GENERATED ALWAYS AS IDENTITY. Avoid random UUIDs (UUIDv4) as primary keys; use uuidv7() when you need UUIDs.

CREATE TABLE user (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  email TEXT NOT NULL UNIQUE
);

Data Types

Use	Avoid
`TEXT`, `VARCHAR`	Extension-specific types
`JSONB`	Custom ENUMs (use CHECK instead)
`TIMESTAMPTZ`	`TIMESTAMP` without time zone
`BIGINT`, `INTEGER`	Platform-specific types

Prefer CHECK constraints over ENUM types — easier to modify:

CREATE TABLE order (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  status TEXT NOT NULL CHECK (status IN ('pending', 'shipped', 'delivered'))
);

Foreign Keys

Always index FK columns (PostgreSQL does not auto-create these)
Avoid circular FK dependencies
Use ON DELETE CASCADE or ON DELETE SET NULL explicitly

CREATE TABLE order (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  customer_id BIGINT NOT NULL REFERENCES customer(id) ON DELETE CASCADE
);
CREATE INDEX order_customer_id_idx ON order (customer_id);

Naming Conventions

Tables: singular snake_case (user_account, order_item)
Columns: singular snake_case (created_at, user_id)
Indexes: {table}_{column}_idx
Constraints: {table}_{column}_{type}

General Guidelines

Add NOT NULL to as many columns as possible
Add created_at TIMESTAMPTZ DEFAULT NOW() to all tables
Use BIGINT for all IDs and foreign keys, even on small tables
Keep tables normalized; denormalize only for proven hot read paths

2.2 Indexing Best Practices

Core Rules

Always index foreign key columns
Index columns in WHERE, JOIN, and ORDER BY
Don't over-index — each index slows writes
Verify with EXPLAIN ANALYZE

Composite Indexes

Put equality columns first, then range/sort columns:

CREATE INDEX order_status_created_idx ON order (status, created_at);

Partial Indexes

Reduce index size for common query patterns:

CREATE INDEX order_active_idx ON order (customer_id)
  WHERE status = 'active';

Index Types

Type	Use Case	Example
B-tree (default)	Equality, range, sorting	`WHERE id = 1`, `ORDER BY date`
GIN	Arrays, JSONB, full-text	`WHERE tags @> ARRAY['x']`
GiST	Geometric, range types, full-text	PostGIS, `tsrange`
BRIN	Large sequential/time-series	Append-only logs (requires row order correlation)

CREATE INDEX metadata_idx ON order USING GIN (metadata);
CREATE INDEX event_created_idx ON event USING BRIN (created_at);

2.3 Operations & Architecture

Topic	Description
Process Architecture	Multi-process model, connection pooling, auxiliary processes
Memory Architecture	Shared/private memory layout, OS page cache, OOM prevention
MVCC Transactions	Isolation levels, XID wraparound, serialization errors
WAL and Checkpoints	WAL internals, checkpoint tuning, durability, crash recovery
Replication	Streaming replication, slots, sync commit, failover
Storage Layout	PGDATA structure, TOAST, fillfactor, tablespaces
Monitoring	pg_stat views, logging, pg_stat_statements, host metrics
Backup and Recovery	pg_dump, pg_basebackup, PITR, WAL archiving
MVCC and VACUUM	Dead tuples, long transactions, xid wraparound prevention
Optimization Checklist	Pre-optimization audit, cleanup, readiness checks
Query Patterns	SQL anti-patterns, JOINs, pagination, batch queries

2.4 PlanetScale-Specific Features

Topic	Description
Connection Pooling	PgBouncer, pool sizing, pooled vs direct
Extensions	Supported extensions, compatibility
Connections	Connection troubleshooting, drivers, SSL
Insights	Slow queries, MCP server, pscale CLI
CLI Commands	pscale CLI reference, branches, deploy requests, auth
CLI API Insights	Query insights via `pscale api`, schema analysis

Chapter 3: Vitess

Vitess is a MySQL-compatible, cloud-native database system originally built at YouTube to scale MySQL. PlanetScale runs Vitess as a managed service.

3.1 Overview & Key Concepts

Core Capabilities

Horizontal sharding: Built-in, transparent to the application.
Connection pooling: VTGate multiplexes client connections far beyond native MySQL limits.
High availability: Automatic primary failure detection and repair. Near-zero downtime resharding.
Query rewriting and caching: VTGate rewrites and optimizes before routing.
Schema management: Apply changes across all shards consistently.
Materialized views and messaging: Cross-shard views and pub/sub via VStream.

Key Concepts

Concept	Description
Keyspace	Logical database mapping to one or more shards. Analogous to a MySQL schema.
Shard	A horizontal partition, each backed by a separate MySQL instance.
VSchema	Configuration defining how tables map to shards, vindex keys, and routing rules.
Vindex	Sharding function mapping column values to shards (`hash`, `xxhash`, `lookup`).
VTGate	Stateless proxy that plans and routes queries to the correct shard(s).
Online DDL	Non-blocking schema migrations. On PlanetScale, use deploy requests.

PlanetScale Specifics

Branching: Git-like database branches for development; deploy requests for production.
Connections: MySQL protocol, port 3306 (direct) or 443 (serverless). SSL always required.

3.2 VSchema Design & Configuration

The VSchema tells VTGate how to route queries. It defines table-to-shard mappings, vindex columns, and table relationships.

Structure

{ "sharded": true, "vindexes": { ... }, "tables": { ... } }

For unsharded: { "tables": { "product": {}, "my_seq": { "type": "sequence" } } }

Vindexes

Every sharded table needs a Primary Vindex (unique, immutable after insert).

Vindex Type	Use For
`xxhash`	Any column type (most common)
`unicode_loose_xxhash`	Text columns needing case-insensitive hashing
`binary_md5`	Any column type (MD5-based alternative)

Choosing a primary vindex column: pick the column most used in high-QPS WHERE clauses, that enables join co-location, keeps transactions single-shard, and has high cardinality.

Co-location Example

{
  "sharded": true,
  "vindexes": { "xxhash": { "type": "xxhash" } },
  "tables": {
    "customer": { "column_vindexes": [{ "column": "customer_id", "name": "xxhash" }] },
    "orders":   { "column_vindexes": [{ "column": "customer_id", "name": "xxhash" }] }
  }
}

Both tables shard on customer_id — rows with the same customer_id land on the same shard, enabling single-shard joins and transactions.

Lookup Vindexes

Provide secondary routing to avoid scatter queries:

"customer_email_lookup": {
  "type": "consistent_lookup_unique",
  "params": { "table": "product.customer_email_lookup", "from": "email", "to": "keyspace_id" },
  "owner": "customer"
}

Sequences

Replace MySQL AUTO_INCREMENT for sharded tables (per-shard auto-increment produces duplicates):

CREATE TABLE customer_seq (id BIGINT, next_id BIGINT, cache BIGINT, PRIMARY KEY (id))
  COMMENT 'vitess_sequence';
INSERT INTO customer_seq (id, next_id, cache) VALUES (0, 1, 1000);

Sharding Guidelines

250 GB per shard is the sweet spot.
Highest-QPS query's WHERE clause dictates primary vindex.
Co-locate joined tables; keep transactions local.
For multi-tenant apps, use multi-column vindexes.
MoveTables can change sharding keys later.

Troubleshooting Scatter Queries

Check: is WHERE filtering on primary vindex? Is a lookup vindex configured? Use VEXPLAIN PLAN to see routing. Use VEXPLAIN ALL for MySQL query plans, VEXPLAIN TRACE for row-level metrics.

3.3 SQL Compatibility & Limitations

Most MySQL syntax works. Known limitations:

Stored procedures / triggers / events: Not supported through VTGate.
LOCK TABLES / GET_LOCK: Not supported.
SELECT ... FOR UPDATE: Works within a single shard; cross-shard locking is not atomic.
Cross-shard joins: Supported but expensive (scatter-gather).
AUTO_INCREMENT: Use sequences or app-generated IDs on sharded tables.
Aggregations on sharded tables: Merge in VTGate memory; large result sets can be slow.
Foreign keys: Limited. Prefer application-level referential integrity.

Chapter 4: Neki — Sharded Postgres

Note: Neki is not yet a released product but will be available soon.

4.1 Overview

Neki is a sharded Postgres product built by PlanetScale — the company behind Vitess. It brings PlanetScale's expertise in horizontal scaling to the Postgres ecosystem.

Sharded Postgres — horizontal sharding beyond single-node limits.
Managed by PlanetScale — proven infrastructure and operational experience.
High availability — leveraging PlanetScale's track record of availability.

Learn more: neki.dev

4.2 Pre-Sharding Readiness Guide

Prepare your Postgres schema for future horizontal sharding with minimal rework.

Shard Key Design

Choose a shard key now, even if you're not sharding yet:

Present on every tenant/user-scoped table
Included in every frequent query's WHERE clause
High cardinality, evenly distributed, immutable

Common choices: tenant_id, org_id, user_id, account_id.

Primary Keys

-- Single-column PK that IS the shard key
CREATE TABLE users (user_id BIGINT PRIMARY KEY, ...);

-- Composite PK with shard key LEADING
CREATE TABLE orders (
  user_id BIGINT NOT NULL,
  id BIGINT GENERATED ALWAYS AS IDENTITY,
  PRIMARY KEY (user_id, id)
);

-- ❌ WRONG: shard key not leading
PRIMARY KEY (id, user_id)

Co-located Data

Tables frequently joined must share the same shard key:

-- Correct: shard-local join
SELECT o.id, oi.product_id FROM orders o
JOIN order_items oi ON oi.tenant_id = o.tenant_id AND oi.order_id = o.id
WHERE o.tenant_id = $1;

Reference Tables

Small, rarely-changing lookup tables (countries, currencies) get replicated across shards — no shard key needed.

Query Patterns

Every query on sharded tables must include the shard key:

-- ✓ Correct: routed to single shard
SELECT * FROM orders WHERE tenant_id = $1 AND status = 'pending';

-- ❌ Wrong: hits ALL shards
SELECT * FROM orders WHERE status = 'pending';

Indexes

Lead indexes with the shard key. Scope unique constraints to include it:

-- ✓ Correct
CREATE INDEX idx ON orders (tenant_id, status, created_at);
ALTER TABLE orders ADD CONSTRAINT uq UNIQUE (tenant_id, order_number);

-- ❌ Wrong: without shard key
CREATE INDEX idx ON orders (status, created_at);

Foreign Keys

Within same shard key: may be supported
Cross-shard-key: must move to application-level enforcement

Transactions

Keep transactions within a single shard key value. Cross-shard = 2PC = slow.

Aggregations

Global COUNT(*), SUM() across shards are expensive. Scope to shard key or use rollup tables.

Shard-Readiness Checklist

✅ Shard key present on every tenant-scoped table
✅ Composite PKs with shard key leading
✅ Shard key in all queries, indexes, and joins
✅ Unique constraints scoped to include shard key
✅ Cross-shard FKs audited
✅ Transactions scoped to single shard key
✅ Global aggregations identified; rollup plan in place
✅ Migrations avoid long locks
✅ Lookup/mapping paths hardened with backfill and monitoring

Chapter 5: Cross-Cutting Best Practices

These patterns apply regardless of which database engine you use.

Primary Key Design

Default to BIGINT (auto-increment or identity).
Avoid random UUIDs as clustered/primary keys; prefer time-ordered (UUIDv7, ULID, Snowflake).
Keep PKs narrow — they're embedded in every secondary index.

Indexing Strategy

Equality columns first, then range/sort.
Verify indexes are used with EXPLAIN / EXPLAIN ANALYZE.
Audit for unused indexes periodically.
Don't over-index — each index costs write performance.

Query Optimization

Avoid SELECT * — list needed columns.
Cursor pagination over OFFSET.
Avoid functions on indexed columns in WHERE.
Batch inserts for bulk operations.

Schema Discipline

NOT NULL by default.
Use proper timestamp types (DATETIME for MySQL, TIMESTAMPTZ for Postgres).
Normalize first, denormalize only for measured hot paths.

Operations

Test DDL on non-production first.
Use online/non-blocking DDL where available.
Monitor replication lag.
Tune connection pooling appropriately.

Sharding Readiness

Even if you're not sharding today, design with a shard key in mind.
Keep transactions, joins, and queries scoped to what could be a single shard.
It's 100x easier to prepare for sharding from day one than retrofit later.

Compiled from PlanetScale's database-skills repository. For the full reference documents with deeper detail on each topic, visit the source repository.

PlanetScale Database Skills — The Complete Guide

About

Preview