CandleKeep
PostgreSQL

PlanetScale Database Skills — The Complete Guide

by PlanetScale

planetscalepostgresmysqlnekivitessbest-practicesskill
Pages1
Formatmarkdown
ListedFebruary 19, 2026
UpdatedFebruary 19, 2026
Subscribers22

About

Comprehensive reference for MySQL/InnoDB, PostgreSQL, Vitess, and Neki from PlanetScale's open-source database-skills collection. Covers schema design, primary keys, indexing, query optimization, EXPLAIN analysis, transactions, isolation levels, sharding, VSchema, and pre-sharding readiness.

5Chapters
20Topics
1Pages

Preview

PlanetScale Database Skills — The Complete Guide

A comprehensive reference for MySQL, PostgreSQL, Vitess, and Neki — compiled from PlanetScale's open-source database-skills collection.

Author: PlanetScale
Version: 1.0.0 (February 2026)
Source: github.com/planetscale/database-skills


Table of Contents

  1. MySQL / InnoDB
    • 1.1 Schema Design
    • 1.2 Primary Keys
    • 1.3 Indexing & Composite Indexes
    • 1.4 Query Optimization & EXPLAIN Analysis
    • 1.5 Transactions & Isolation Levels
    • 1.6 Operations
  2. PostgreSQL
    • 2.1 Schema Design
    • 2.2 Indexing Best Practices
    • 2.3 Operations & Architecture
    • 2.4 PlanetScale-Specific Features
  3. Vitess
    • 3.1 Overview & Key Concepts
    • 3.2 VSchema Design & Configuration
    • 3.3 SQL Compatibility & Limitations
    • 3.4 References (Schema Changes, VReplication, Architecture, Query Serving)
  4. Neki — Sharded Postgres
    • 4.1 Overview
    • 4.2 Pre-Sharding Readiness Guide
  5. Cross-Cutting Best Practices

Chapter 1: MySQL / InnoDB

Use this chapter to make safe, measurable MySQL/InnoDB changes.

Workflow:

  1. Define workload and constraints (read/write mix, latency target, data volume, MySQL version, hosting platform).
  2. Propose the smallest change that can solve the problem, including trade-offs.
  3. Validate with evidence (EXPLAIN, EXPLAIN ANALYZE, lock/connection metrics, and production-safe rollout steps).
  4. For production changes, include rollback and post-deploy verification.

1.1 Schema Design

  • Prefer narrow, monotonic PKs (BIGINT UNSIGNED AUTO_INCREMENT) for write-heavy OLTP tables.
  • Avoid random UUID values as clustered PKs; if external IDs are required, keep UUID in a secondary unique column.
  • Always utf8mb4 / utf8mb4_0900_ai_ci. Prefer NOT NULL, DATETIME over TIMESTAMP.
  • Lookup tables over ENUM. Normalize to 3NF; denormalize only for measured hot paths.

Data Types Quick Reference

UsePrefer
Character setutf8mb4 always
Collationutf8mb4_0900_ai_ci
TimestampsDATETIME over TIMESTAMP
NullabilityNOT NULL by default
EnumsLookup tables instead

1.2 Primary Keys

InnoDB stores rows in primary key order (clustered index). This means:

  • Sequential keys = optimal inserts: new rows append, minimizing page splits and fragmentation.
  • Random keys = fragmentation: random inserts cause page splits to maintain PK order.
  • Secondary index lookups: secondary indexes store the PK value and use it to fetch the full row.

INT vs BIGINT

  • INT UNSIGNED: 4 bytes, max ~4.3B rows.
  • BIGINT UNSIGNED: 8 bytes, max ~18.4 quintillion rows.

Guideline: Default to BIGINT UNSIGNED unless you're certain the table will never approach the INT limit.

Avoid Random UUID as Clustered PK

UUID PK stored as BINARY(16): 16 bytes (vs 8 for BIGINT). Random inserts cause page splits. If external identifiers are required, store UUID as BINARY(16) in a secondary unique column:

CREATE TABLE users (
  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  public_id BINARY(16) NOT NULL,
  UNIQUE KEY idx_public_id (public_id)
);
INSERT INTO users (public_id) VALUES (UUID_TO_BIN(?, 1));

Prefer time-ordered variants such as UUIDv7 to reduce index fragmentation.

Alternative Ordered IDs

  • Snowflake-style: 64-bit integers (fits in BIGINT), time-ordered, compact.
  • ULID / UUIDv7: 128-bit (store as BINARY(16)), time-ordered.

Composite Primary Keys

Use for join/many-to-many tables. Most-queried column first:

CREATE TABLE user_roles (
  user_id BIGINT UNSIGNED NOT NULL,
  role_id BIGINT UNSIGNED NOT NULL,
  PRIMARY KEY (user_id, role_id)
);

1.3 Indexing & Composite Indexes

Core Principles

  • Composite order: equality first, then range/sort (leftmost prefix rule).
  • Range predicates stop index usage for subsequent columns.
  • Secondary indexes include PK implicitly. Prefix indexes for long strings.
  • Audit via performance_schema — drop indexes with count_read = 0.

Leftmost Prefix Rule

Index (a, b, c) is usable for:

  • WHERE a
  • WHERE a AND b
  • WHERE a AND b AND c
  • WHERE a AND c → uses only column a
  • WHERE b alone → NOT usable

Column Order: Equality First, Then Range/Sort

-- Query: WHERE tenant_id = ? AND status = ? AND created_at > ?
CREATE INDEX idx_orders ON orders (tenant_id, status, created_at);

Critical: Range predicates (>, <, BETWEEN, LIKE 'prefix%') stop index usage for filtering subsequent columns. However, columns after a range predicate can still be useful for covering index reads.

Sort Order Must Match Index

ORDER BY status ASC, created_at ASC   -- ✓ matches
ORDER BY status DESC, created_at DESC -- ✓ full reverse OK
ORDER BY status ASC, created_at DESC  -- ⚠️ mixed (may use filesort)

-- MySQL 8.0+: descending index components
CREATE INDEX idx ON orders (status ASC, created_at DESC);

Composite vs Multiple Single-Column Indexes

A composite index is typically faster than index merge. Index merge is useful when queries filter on different column combinations that don't share a common prefix.

Design for Multiple Queries

-- One index covers: WHERE user_id=?, WHERE user_id=? AND status=?,
--   and WHERE user_id=? AND status=? ORDER BY created_at DESC
CREATE INDEX idx ON orders (user_id, status, created_at DESC);

1.4 Query Optimization & EXPLAIN Analysis

EXPLAIN Commands

EXPLAIN SELECT ...;                    -- estimated plan
EXPLAIN FORMAT=JSON SELECT ...;        -- detailed with cost estimates
EXPLAIN FORMAT=TREE SELECT ...;        -- tree format (8.0+)
EXPLAIN ANALYZE SELECT ...;            -- actual execution (8.0.18+, runs the query)

Access Types (Best → Worst)

systemconsteq_refrefrangeindexALL

Target ref or better. ALL on >1000 rows almost always needs an index.

Key Extra Flags

FlagMeaningAction
Using indexCovering index (optimal)None
Using filesortSort not via indexIndex the ORDER BY columns
Using temporaryTemp table for GROUP BYIndex the grouped columns
Using join bufferNo index on join columnAdd index on join column
Using index conditionICP — engine filters at index levelGenerally good

key_len — How Much of Composite Index Is Used

Byte sizes: TINYINT=1, INT=4, BIGINT=8, DATE=3, DATETIME=5, VARCHAR(N) utf8mb4: N×4+1 (or +2 when N×4>255). Add 1 byte per nullable column.

rows vs filtered

  • rows: estimated rows examined after index access
  • filtered: percent expected to pass the full WHERE
  • Rough output: rows × filtered / 100

EXPLAIN ANALYZE (MySQL 8.0.18+)

Actually executes the query. Metrics: actual time (ms), rows (actual), loops (iterations). Compare estimated vs actual to find optimizer misestimates. Refresh with ANALYZE TABLE your_table;.

Query Optimization Tips

  • Check EXPLAIN — red flags: type: ALL, Using filesort, Using temporary.
  • Cursor pagination, not OFFSET. Avoid functions on indexed columns in WHERE.
  • Batch inserts (500–5000 rows). UNION ALL over UNION when dedup unnecessary.

1.5 Transactions & Isolation Levels

Default: REPEATABLE READ

The InnoDB default, most tested, prevents phantom reads. Only change per-session with a measured reason.

SELECT @@transaction_isolation;
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;  -- per-session only

Locking vs Non-Locking Reads

  • Non-locking reads: plain SELECT uses MVCC snapshots. No locks, don't block writers.
  • Locking reads: SELECT ... FOR UPDATE (exclusive) or SELECT ... FOR SHARE (shared).
  • UPDATE and DELETE are implicitly locking reads.

REPEATABLE READ (Prefer This)

  • Snapshot at first read; all plain SELECTs read from that snapshot.
  • Locking reads/writes use next-key locks (row + gap) — prevents phantoms.
  • Use for: OLTP, check-then-insert, financial logic, consistent reports.
  • Avoid mixing locking and non-locking statements in the same transaction.

READ COMMITTED (Per-Session Only)

  • Fresh snapshot per SELECT; record locks only (no gap locks) — more concurrency, phantoms possible.
  • Switch only when: gap-lock deadlocks confirmed, bulk imports with contention, or high-write concurrency on overlapping ranges.
  • Never switch globally. Check-then-insert patterns break.

Decision Guide

ScenarioRecommendation
General OLTP / check-then-insert / reportsREPEATABLE READ (default)
Bulk import or gap-lock deadlocksREAD COMMITTED (per-session)
Need serializabilityExplicit FOR UPDATE at REPEATABLE READ

Deadlock Prevention

  • Consistent row access order prevents deadlocks. Retry error 1213 with backoff.
  • Do I/O outside transactions. Use SELECT ... FOR UPDATE sparingly.

1.6 Operations

  • Use online DDL (ALGORITHM=INPLACE) when possible; test on replicas first.
  • Tune connection pooling — avoid max_connections exhaustion under load.
  • Monitor replication lag; avoid stale reads from replicas during writes.

Partitioning

  • Partition time-series (>50M rows) or large tables (>100M rows). Plan early — retrofit = full rebuild.
  • Include partition column in every unique/PK. Always add a MAXVALUE catch-all.

Guardrails

  • Prefer measured evidence over blanket rules of thumb.
  • Note MySQL-version-specific behavior when giving advice.
  • Ask for explicit human approval before destructive data operations.

Chapter 2: PostgreSQL

2.1 Schema Design

Primary Keys

Prefer BIGINT GENERATED ALWAYS AS IDENTITY. Avoid random UUIDs (UUIDv4) as primary keys; use uuidv7() when you need UUIDs.

CREATE TABLE user (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  email TEXT NOT NULL UNIQUE
);

Data Types

UseAvoid
TEXT, VARCHARExtension-specific types
JSONBCustom ENUMs (use CHECK instead)
TIMESTAMPTZTIMESTAMP without time zone
BIGINT, INTEGERPlatform-specific types

Prefer CHECK constraints over ENUM types — easier to modify:

CREATE TABLE order (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  status TEXT NOT NULL CHECK (status IN ('pending', 'shipped', 'delivered'))
);

Foreign Keys

  • Always index FK columns (PostgreSQL does not auto-create these)
  • Avoid circular FK dependencies
  • Use ON DELETE CASCADE or ON DELETE SET NULL explicitly
CREATE TABLE order (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  customer_id BIGINT NOT NULL REFERENCES customer(id) ON DELETE CASCADE
);
CREATE INDEX order_customer_id_idx ON order (customer_id);

Naming Conventions

  • Tables: singular snake_case (user_account, order_item)
  • Columns: singular snake_case (created_at, user_id)
  • Indexes: {table}_{column}_idx
  • Constraints: {table}_{column}_{type}

General Guidelines

  • Add NOT NULL to as many columns as possible
  • Add created_at TIMESTAMPTZ DEFAULT NOW() to all tables
  • Use BIGINT for all IDs and foreign keys, even on small tables
  • Keep tables normalized; denormalize only for proven hot read paths

2.2 Indexing Best Practices

Core Rules

  1. Always index foreign key columns
  2. Index columns in WHERE, JOIN, and ORDER BY
  3. Don't over-index — each index slows writes
  4. Verify with EXPLAIN ANALYZE

Composite Indexes

Put equality columns first, then range/sort columns:

CREATE INDEX order_status_created_idx ON order (status, created_at);

Partial Indexes

Reduce index size for common query patterns:

CREATE INDEX order_active_idx ON order (customer_id)
  WHERE status = 'active';

Index Types

TypeUse CaseExample
B-tree (default)Equality, range, sortingWHERE id = 1, ORDER BY date
GINArrays, JSONB, full-textWHERE tags @> ARRAY['x']
GiSTGeometric, range types, full-textPostGIS, tsrange
BRINLarge sequential/time-seriesAppend-only logs (requires row order correlation)
CREATE INDEX metadata_idx ON order USING GIN (metadata);
CREATE INDEX event_created_idx ON event USING BRIN (created_at);

2.3 Operations & Architecture

TopicDescription
Process ArchitectureMulti-process model, connection pooling, auxiliary processes
Memory ArchitectureShared/private memory layout, OS page cache, OOM prevention
MVCC TransactionsIsolation levels, XID wraparound, serialization errors
WAL and CheckpointsWAL internals, checkpoint tuning, durability, crash recovery
ReplicationStreaming replication, slots, sync commit, failover
Storage LayoutPGDATA structure, TOAST, fillfactor, tablespaces
Monitoringpg_stat views, logging, pg_stat_statements, host metrics
Backup and Recoverypg_dump, pg_basebackup, PITR, WAL archiving
MVCC and VACUUMDead tuples, long transactions, xid wraparound prevention
Optimization ChecklistPre-optimization audit, cleanup, readiness checks
Query PatternsSQL anti-patterns, JOINs, pagination, batch queries

2.4 PlanetScale-Specific Features

TopicDescription
Connection PoolingPgBouncer, pool sizing, pooled vs direct
ExtensionsSupported extensions, compatibility
ConnectionsConnection troubleshooting, drivers, SSL
InsightsSlow queries, MCP server, pscale CLI
CLI Commandspscale CLI reference, branches, deploy requests, auth
CLI API InsightsQuery insights via pscale api, schema analysis

Chapter 3: Vitess

Vitess is a MySQL-compatible, cloud-native database system originally built at YouTube to scale MySQL. PlanetScale runs Vitess as a managed service.

3.1 Overview & Key Concepts

Core Capabilities

  • Horizontal sharding: Built-in, transparent to the application.
  • Connection pooling: VTGate multiplexes client connections far beyond native MySQL limits.
  • High availability: Automatic primary failure detection and repair. Near-zero downtime resharding.
  • Query rewriting and caching: VTGate rewrites and optimizes before routing.
  • Schema management: Apply changes across all shards consistently.
  • Materialized views and messaging: Cross-shard views and pub/sub via VStream.

Key Concepts

ConceptDescription
KeyspaceLogical database mapping to one or more shards. Analogous to a MySQL schema.
ShardA horizontal partition, each backed by a separate MySQL instance.
VSchemaConfiguration defining how tables map to shards, vindex keys, and routing rules.
VindexSharding function mapping column values to shards (hash, xxhash, lookup).
VTGateStateless proxy that plans and routes queries to the correct shard(s).
Online DDLNon-blocking schema migrations. On PlanetScale, use deploy requests.

PlanetScale Specifics

  • Branching: Git-like database branches for development; deploy requests for production.
  • Connections: MySQL protocol, port 3306 (direct) or 443 (serverless). SSL always required.

3.2 VSchema Design & Configuration

The VSchema tells VTGate how to route queries. It defines table-to-shard mappings, vindex columns, and table relationships.

Structure

{ "sharded": true, "vindexes": { ... }, "tables": { ... } }

For unsharded: { "tables": { "product": {}, "my_seq": { "type": "sequence" } } }

Vindexes

Every sharded table needs a Primary Vindex (unique, immutable after insert).

Vindex TypeUse For
xxhashAny column type (most common)
unicode_loose_xxhashText columns needing case-insensitive hashing
binary_md5Any column type (MD5-based alternative)

Choosing a primary vindex column: pick the column most used in high-QPS WHERE clauses, that enables join co-location, keeps transactions single-shard, and has high cardinality.

Co-location Example

{
  "sharded": true,
  "vindexes": { "xxhash": { "type": "xxhash" } },
  "tables": {
    "customer": { "column_vindexes": [{ "column": "customer_id", "name": "xxhash" }] },
    "orders":   { "column_vindexes": [{ "column": "customer_id", "name": "xxhash" }] }
  }
}

Both tables shard on customer_id — rows with the same customer_id land on the same shard, enabling single-shard joins and transactions.

Lookup Vindexes

Provide secondary routing to avoid scatter queries:

"customer_email_lookup": {
  "type": "consistent_lookup_unique",
  "params": { "table": "product.customer_email_lookup", "from": "email", "to": "keyspace_id" },
  "owner": "customer"
}

Sequences

Replace MySQL AUTO_INCREMENT for sharded tables (per-shard auto-increment produces duplicates):

CREATE TABLE customer_seq (id BIGINT, next_id BIGINT, cache BIGINT, PRIMARY KEY (id))
  COMMENT 'vitess_sequence';
INSERT INTO customer_seq (id, next_id, cache) VALUES (0, 1, 1000);

Sharding Guidelines

  • 250 GB per shard is the sweet spot.
  • Highest-QPS query's WHERE clause dictates primary vindex.
  • Co-locate joined tables; keep transactions local.
  • For multi-tenant apps, use multi-column vindexes.
  • MoveTables can change sharding keys later.

Troubleshooting Scatter Queries

Check: is WHERE filtering on primary vindex? Is a lookup vindex configured? Use VEXPLAIN PLAN to see routing. Use VEXPLAIN ALL for MySQL query plans, VEXPLAIN TRACE for row-level metrics.

3.3 SQL Compatibility & Limitations

Most MySQL syntax works. Known limitations:

  • Stored procedures / triggers / events: Not supported through VTGate.
  • LOCK TABLES / GET_LOCK: Not supported.
  • SELECT ... FOR UPDATE: Works within a single shard; cross-shard locking is not atomic.
  • Cross-shard joins: Supported but expensive (scatter-gather).
  • AUTO_INCREMENT: Use sequences or app-generated IDs on sharded tables.
  • Aggregations on sharded tables: Merge in VTGate memory; large result sets can be slow.
  • Foreign keys: Limited. Prefer application-level referential integrity.

Chapter 4: Neki — Sharded Postgres

Note: Neki is not yet a released product but will be available soon.

4.1 Overview

Neki is a sharded Postgres product built by PlanetScale — the company behind Vitess. It brings PlanetScale's expertise in horizontal scaling to the Postgres ecosystem.

  • Sharded Postgres — horizontal sharding beyond single-node limits.
  • Managed by PlanetScale — proven infrastructure and operational experience.
  • High availability — leveraging PlanetScale's track record of availability.

Learn more: neki.dev

4.2 Pre-Sharding Readiness Guide

Prepare your Postgres schema for future horizontal sharding with minimal rework.

Shard Key Design

Choose a shard key now, even if you're not sharding yet:

  • Present on every tenant/user-scoped table
  • Included in every frequent query's WHERE clause
  • High cardinality, evenly distributed, immutable

Common choices: tenant_id, org_id, user_id, account_id.

Primary Keys

-- Single-column PK that IS the shard key
CREATE TABLE users (user_id BIGINT PRIMARY KEY, ...);

-- Composite PK with shard key LEADING
CREATE TABLE orders (
  user_id BIGINT NOT NULL,
  id BIGINT GENERATED ALWAYS AS IDENTITY,
  PRIMARY KEY (user_id, id)
);

-- ❌ WRONG: shard key not leading
PRIMARY KEY (id, user_id)

Co-located Data

Tables frequently joined must share the same shard key:

-- Correct: shard-local join
SELECT o.id, oi.product_id FROM orders o
JOIN order_items oi ON oi.tenant_id = o.tenant_id AND oi.order_id = o.id
WHERE o.tenant_id = $1;

Reference Tables

Small, rarely-changing lookup tables (countries, currencies) get replicated across shards — no shard key needed.

Query Patterns

Every query on sharded tables must include the shard key:

-- ✓ Correct: routed to single shard
SELECT * FROM orders WHERE tenant_id = $1 AND status = 'pending';

-- ❌ Wrong: hits ALL shards
SELECT * FROM orders WHERE status = 'pending';

Indexes

Lead indexes with the shard key. Scope unique constraints to include it:

-- ✓ Correct
CREATE INDEX idx ON orders (tenant_id, status, created_at);
ALTER TABLE orders ADD CONSTRAINT uq UNIQUE (tenant_id, order_number);

-- ❌ Wrong: without shard key
CREATE INDEX idx ON orders (status, created_at);

Foreign Keys

  • Within same shard key: may be supported
  • Cross-shard-key: must move to application-level enforcement

Transactions

Keep transactions within a single shard key value. Cross-shard = 2PC = slow.

Aggregations

Global COUNT(*), SUM() across shards are expensive. Scope to shard key or use rollup tables.

Shard-Readiness Checklist

  1. ✅ Shard key present on every tenant-scoped table
  2. ✅ Composite PKs with shard key leading
  3. ✅ Shard key in all queries, indexes, and joins
  4. ✅ Unique constraints scoped to include shard key
  5. ✅ Cross-shard FKs audited
  6. ✅ Transactions scoped to single shard key
  7. ✅ Global aggregations identified; rollup plan in place
  8. ✅ Migrations avoid long locks
  9. ✅ Lookup/mapping paths hardened with backfill and monitoring

Chapter 5: Cross-Cutting Best Practices

These patterns apply regardless of which database engine you use.

Primary Key Design

  • Default to BIGINT (auto-increment or identity).
  • Avoid random UUIDs as clustered/primary keys; prefer time-ordered (UUIDv7, ULID, Snowflake).
  • Keep PKs narrow — they're embedded in every secondary index.

Indexing Strategy

  • Equality columns first, then range/sort.
  • Verify indexes are used with EXPLAIN / EXPLAIN ANALYZE.
  • Audit for unused indexes periodically.
  • Don't over-index — each index costs write performance.

Query Optimization

  • Avoid SELECT * — list needed columns.
  • Cursor pagination over OFFSET.
  • Avoid functions on indexed columns in WHERE.
  • Batch inserts for bulk operations.

Schema Discipline

  • NOT NULL by default.
  • Use proper timestamp types (DATETIME for MySQL, TIMESTAMPTZ for Postgres).
  • Normalize first, denormalize only for measured hot paths.

Operations

  • Test DDL on non-production first.
  • Use online/non-blocking DDL where available.
  • Monitor replication lag.
  • Tune connection pooling appropriately.

Sharding Readiness

  • Even if you're not sharding today, design with a shard key in mind.
  • Keep transactions, joins, and queries scoped to what could be a single shard.
  • It's 100x easier to prepare for sharding from day one than retrofit later.

Compiled from PlanetScale's database-skills repository. For the full reference documents with deeper detail on each topic, visit the source repository.

Add to library to read more

Table of Contents

Cross-Cutting Best Practices
Add to Library

Free · Live updates included

22 readers subscribed