PlanetScale Database Skills — The Complete Guide
by PlanetScale
Add to your library first to use in Claude Code
About
Comprehensive reference for MySQL/InnoDB, PostgreSQL, Vitess, and Neki from PlanetScale's open-source database-skills collection. Covers schema design, primary keys, indexing, query optimization, EXPLAIN analysis, transactions, isolation levels, sharding, VSchema, and pre-sharding readiness.
Preview
PlanetScale Database Skills — The Complete Guide
A comprehensive reference for MySQL, PostgreSQL, Vitess, and Neki — compiled from PlanetScale's open-source database-skills collection.
Author: PlanetScale
Version: 1.0.0 (February 2026)
Source: github.com/planetscale/database-skills
Table of Contents
- MySQL / InnoDB
- 1.1 Schema Design
- 1.2 Primary Keys
- 1.3 Indexing & Composite Indexes
- 1.4 Query Optimization & EXPLAIN Analysis
- 1.5 Transactions & Isolation Levels
- 1.6 Operations
- PostgreSQL
- 2.1 Schema Design
- 2.2 Indexing Best Practices
- 2.3 Operations & Architecture
- 2.4 PlanetScale-Specific Features
- Vitess
- 3.1 Overview & Key Concepts
- 3.2 VSchema Design & Configuration
- 3.3 SQL Compatibility & Limitations
- 3.4 References (Schema Changes, VReplication, Architecture, Query Serving)
- Neki — Sharded Postgres
- 4.1 Overview
- 4.2 Pre-Sharding Readiness Guide
- Cross-Cutting Best Practices
Chapter 1: MySQL / InnoDB
Use this chapter to make safe, measurable MySQL/InnoDB changes.
Workflow:
- Define workload and constraints (read/write mix, latency target, data volume, MySQL version, hosting platform).
- Propose the smallest change that can solve the problem, including trade-offs.
- Validate with evidence (
EXPLAIN,EXPLAIN ANALYZE, lock/connection metrics, and production-safe rollout steps). - For production changes, include rollback and post-deploy verification.
1.1 Schema Design
- Prefer narrow, monotonic PKs (
BIGINT UNSIGNED AUTO_INCREMENT) for write-heavy OLTP tables. - Avoid random UUID values as clustered PKs; if external IDs are required, keep UUID in a secondary unique column.
- Always
utf8mb4/utf8mb4_0900_ai_ci. PreferNOT NULL,DATETIMEoverTIMESTAMP. - Lookup tables over
ENUM. Normalize to 3NF; denormalize only for measured hot paths.
Data Types Quick Reference
| Use | Prefer |
|---|---|
| Character set | utf8mb4 always |
| Collation | utf8mb4_0900_ai_ci |
| Timestamps | DATETIME over TIMESTAMP |
| Nullability | NOT NULL by default |
| Enums | Lookup tables instead |
1.2 Primary Keys
InnoDB stores rows in primary key order (clustered index). This means:
- Sequential keys = optimal inserts: new rows append, minimizing page splits and fragmentation.
- Random keys = fragmentation: random inserts cause page splits to maintain PK order.
- Secondary index lookups: secondary indexes store the PK value and use it to fetch the full row.
INT vs BIGINT
- INT UNSIGNED: 4 bytes, max ~4.3B rows.
- BIGINT UNSIGNED: 8 bytes, max ~18.4 quintillion rows.
Guideline: Default to BIGINT UNSIGNED unless you're certain the table will never approach the INT limit.
Avoid Random UUID as Clustered PK
UUID PK stored as BINARY(16): 16 bytes (vs 8 for BIGINT). Random inserts cause page splits. If external identifiers are required, store UUID as BINARY(16) in a secondary unique column:
CREATE TABLE users (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
public_id BINARY(16) NOT NULL,
UNIQUE KEY idx_public_id (public_id)
);
INSERT INTO users (public_id) VALUES (UUID_TO_BIN(?, 1));
Prefer time-ordered variants such as UUIDv7 to reduce index fragmentation.
Alternative Ordered IDs
- Snowflake-style: 64-bit integers (fits in BIGINT), time-ordered, compact.
- ULID / UUIDv7: 128-bit (store as
BINARY(16)), time-ordered.
Composite Primary Keys
Use for join/many-to-many tables. Most-queried column first:
CREATE TABLE user_roles (
user_id BIGINT UNSIGNED NOT NULL,
role_id BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (user_id, role_id)
);
1.3 Indexing & Composite Indexes
Core Principles
- Composite order: equality first, then range/sort (leftmost prefix rule).
- Range predicates stop index usage for subsequent columns.
- Secondary indexes include PK implicitly. Prefix indexes for long strings.
- Audit via
performance_schema— drop indexes withcount_read = 0.
Leftmost Prefix Rule
Index (a, b, c) is usable for:
WHERE a✓WHERE a AND b✓WHERE a AND b AND c✓WHERE a AND c→ uses only columnaWHERE balone → NOT usable
Column Order: Equality First, Then Range/Sort
-- Query: WHERE tenant_id = ? AND status = ? AND created_at > ?
CREATE INDEX idx_orders ON orders (tenant_id, status, created_at);
Critical: Range predicates (>, <, BETWEEN, LIKE 'prefix%') stop index usage for filtering subsequent columns. However, columns after a range predicate can still be useful for covering index reads.
Sort Order Must Match Index
ORDER BY status ASC, created_at ASC -- ✓ matches
ORDER BY status DESC, created_at DESC -- ✓ full reverse OK
ORDER BY status ASC, created_at DESC -- ⚠️ mixed (may use filesort)
-- MySQL 8.0+: descending index components
CREATE INDEX idx ON orders (status ASC, created_at DESC);
Composite vs Multiple Single-Column Indexes
A composite index is typically faster than index merge. Index merge is useful when queries filter on different column combinations that don't share a common prefix.
Design for Multiple Queries
-- One index covers: WHERE user_id=?, WHERE user_id=? AND status=?,
-- and WHERE user_id=? AND status=? ORDER BY created_at DESC
CREATE INDEX idx ON orders (user_id, status, created_at DESC);
1.4 Query Optimization & EXPLAIN Analysis
EXPLAIN Commands
EXPLAIN SELECT ...; -- estimated plan
EXPLAIN FORMAT=JSON SELECT ...; -- detailed with cost estimates
EXPLAIN FORMAT=TREE SELECT ...; -- tree format (8.0+)
EXPLAIN ANALYZE SELECT ...; -- actual execution (8.0.18+, runs the query)
Access Types (Best → Worst)
system → const → eq_ref → ref → range → index → ALL
Target ref or better. ALL on >1000 rows almost always needs an index.
Key Extra Flags
| Flag | Meaning | Action |
|---|---|---|
Using index | Covering index (optimal) | None |
Using filesort | Sort not via index | Index the ORDER BY columns |
Using temporary | Temp table for GROUP BY | Index the grouped columns |
Using join buffer | No index on join column | Add index on join column |
Using index condition | ICP — engine filters at index level | Generally good |
key_len — How Much of Composite Index Is Used
Byte sizes: TINYINT=1, INT=4, BIGINT=8, DATE=3, DATETIME=5, VARCHAR(N) utf8mb4: N×4+1 (or +2 when N×4>255). Add 1 byte per nullable column.
rows vs filtered
rows: estimated rows examined after index accessfiltered: percent expected to pass the full WHERE- Rough output:
rows × filtered / 100
EXPLAIN ANALYZE (MySQL 8.0.18+)
Actually executes the query. Metrics: actual time (ms), rows (actual), loops (iterations). Compare estimated vs actual to find optimizer misestimates. Refresh with ANALYZE TABLE your_table;.
Query Optimization Tips
- Check
EXPLAIN— red flags:type: ALL,Using filesort,Using temporary. - Cursor pagination, not
OFFSET. Avoid functions on indexed columns inWHERE. - Batch inserts (500–5000 rows).
UNION ALLoverUNIONwhen dedup unnecessary.
1.5 Transactions & Isolation Levels
Default: REPEATABLE READ
The InnoDB default, most tested, prevents phantom reads. Only change per-session with a measured reason.
SELECT @@transaction_isolation;
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED; -- per-session only
Locking vs Non-Locking Reads
- Non-locking reads: plain
SELECTuses MVCC snapshots. No locks, don't block writers. - Locking reads:
SELECT ... FOR UPDATE(exclusive) orSELECT ... FOR SHARE(shared). UPDATEandDELETEare implicitly locking reads.
REPEATABLE READ (Prefer This)
- Snapshot at first read; all plain SELECTs read from that snapshot.
- Locking reads/writes use next-key locks (row + gap) — prevents phantoms.
- Use for: OLTP, check-then-insert, financial logic, consistent reports.
- Avoid mixing locking and non-locking statements in the same transaction.
READ COMMITTED (Per-Session Only)
- Fresh snapshot per SELECT; record locks only (no gap locks) — more concurrency, phantoms possible.
- Switch only when: gap-lock deadlocks confirmed, bulk imports with contention, or high-write concurrency on overlapping ranges.
- Never switch globally. Check-then-insert patterns break.
Decision Guide
| Scenario | Recommendation |
|---|---|
| General OLTP / check-then-insert / reports | REPEATABLE READ (default) |
| Bulk import or gap-lock deadlocks | READ COMMITTED (per-session) |
| Need serializability | Explicit FOR UPDATE at REPEATABLE READ |
Deadlock Prevention
- Consistent row access order prevents deadlocks. Retry error 1213 with backoff.
- Do I/O outside transactions. Use
SELECT ... FOR UPDATEsparingly.
1.6 Operations
- Use online DDL (
ALGORITHM=INPLACE) when possible; test on replicas first. - Tune connection pooling — avoid
max_connectionsexhaustion under load. - Monitor replication lag; avoid stale reads from replicas during writes.
Partitioning
- Partition time-series (>50M rows) or large tables (>100M rows). Plan early — retrofit = full rebuild.
- Include partition column in every unique/PK. Always add a
MAXVALUEcatch-all.
Guardrails
- Prefer measured evidence over blanket rules of thumb.
- Note MySQL-version-specific behavior when giving advice.
- Ask for explicit human approval before destructive data operations.
Chapter 2: PostgreSQL
2.1 Schema Design
Primary Keys
Prefer BIGINT GENERATED ALWAYS AS IDENTITY. Avoid random UUIDs (UUIDv4) as primary keys; use uuidv7() when you need UUIDs.
CREATE TABLE user (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email TEXT NOT NULL UNIQUE
);
Data Types
| Use | Avoid |
|---|---|
TEXT, VARCHAR | Extension-specific types |
JSONB | Custom ENUMs (use CHECK instead) |
TIMESTAMPTZ | TIMESTAMP without time zone |
BIGINT, INTEGER | Platform-specific types |
Prefer CHECK constraints over ENUM types — easier to modify:
CREATE TABLE order (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
status TEXT NOT NULL CHECK (status IN ('pending', 'shipped', 'delivered'))
);
Foreign Keys
- Always index FK columns (PostgreSQL does not auto-create these)
- Avoid circular FK dependencies
- Use
ON DELETE CASCADEorON DELETE SET NULLexplicitly
CREATE TABLE order (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customer(id) ON DELETE CASCADE
);
CREATE INDEX order_customer_id_idx ON order (customer_id);
Naming Conventions
- Tables: singular snake_case (
user_account,order_item) - Columns: singular snake_case (
created_at,user_id) - Indexes:
{table}_{column}_idx - Constraints:
{table}_{column}_{type}
General Guidelines
- Add
NOT NULLto as many columns as possible - Add
created_at TIMESTAMPTZ DEFAULT NOW()to all tables - Use
BIGINTfor all IDs and foreign keys, even on small tables - Keep tables normalized; denormalize only for proven hot read paths
2.2 Indexing Best Practices
Core Rules
- Always index foreign key columns
- Index columns in WHERE, JOIN, and ORDER BY
- Don't over-index — each index slows writes
- Verify with EXPLAIN ANALYZE
Composite Indexes
Put equality columns first, then range/sort columns:
CREATE INDEX order_status_created_idx ON order (status, created_at);
Partial Indexes
Reduce index size for common query patterns:
CREATE INDEX order_active_idx ON order (customer_id)
WHERE status = 'active';
Index Types
| Type | Use Case | Example |
|---|---|---|
| B-tree (default) | Equality, range, sorting | WHERE id = 1, ORDER BY date |
| GIN | Arrays, JSONB, full-text | WHERE tags @> ARRAY['x'] |
| GiST | Geometric, range types, full-text | PostGIS, tsrange |
| BRIN | Large sequential/time-series | Append-only logs (requires row order correlation) |
CREATE INDEX metadata_idx ON order USING GIN (metadata);
CREATE INDEX event_created_idx ON event USING BRIN (created_at);
2.3 Operations & Architecture
| Topic | Description |
|---|---|
| Process Architecture | Multi-process model, connection pooling, auxiliary processes |
| Memory Architecture | Shared/private memory layout, OS page cache, OOM prevention |
| MVCC Transactions | Isolation levels, XID wraparound, serialization errors |
| WAL and Checkpoints | WAL internals, checkpoint tuning, durability, crash recovery |
| Replication | Streaming replication, slots, sync commit, failover |
| Storage Layout | PGDATA structure, TOAST, fillfactor, tablespaces |
| Monitoring | pg_stat views, logging, pg_stat_statements, host metrics |
| Backup and Recovery | pg_dump, pg_basebackup, PITR, WAL archiving |
| MVCC and VACUUM | Dead tuples, long transactions, xid wraparound prevention |
| Optimization Checklist | Pre-optimization audit, cleanup, readiness checks |
| Query Patterns | SQL anti-patterns, JOINs, pagination, batch queries |
2.4 PlanetScale-Specific Features
| Topic | Description |
|---|---|
| Connection Pooling | PgBouncer, pool sizing, pooled vs direct |
| Extensions | Supported extensions, compatibility |
| Connections | Connection troubleshooting, drivers, SSL |
| Insights | Slow queries, MCP server, pscale CLI |
| CLI Commands | pscale CLI reference, branches, deploy requests, auth |
| CLI API Insights | Query insights via pscale api, schema analysis |
Chapter 3: Vitess
Vitess is a MySQL-compatible, cloud-native database system originally built at YouTube to scale MySQL. PlanetScale runs Vitess as a managed service.
3.1 Overview & Key Concepts
Core Capabilities
- Horizontal sharding: Built-in, transparent to the application.
- Connection pooling: VTGate multiplexes client connections far beyond native MySQL limits.
- High availability: Automatic primary failure detection and repair. Near-zero downtime resharding.
- Query rewriting and caching: VTGate rewrites and optimizes before routing.
- Schema management: Apply changes across all shards consistently.
- Materialized views and messaging: Cross-shard views and pub/sub via VStream.
Key Concepts
| Concept | Description |
|---|---|
| Keyspace | Logical database mapping to one or more shards. Analogous to a MySQL schema. |
| Shard | A horizontal partition, each backed by a separate MySQL instance. |
| VSchema | Configuration defining how tables map to shards, vindex keys, and routing rules. |
| Vindex | Sharding function mapping column values to shards (hash, xxhash, lookup). |
| VTGate | Stateless proxy that plans and routes queries to the correct shard(s). |
| Online DDL | Non-blocking schema migrations. On PlanetScale, use deploy requests. |
PlanetScale Specifics
- Branching: Git-like database branches for development; deploy requests for production.
- Connections: MySQL protocol, port
3306(direct) or443(serverless). SSL always required.
3.2 VSchema Design & Configuration
The VSchema tells VTGate how to route queries. It defines table-to-shard mappings, vindex columns, and table relationships.
Structure
{ "sharded": true, "vindexes": { ... }, "tables": { ... } }
For unsharded: { "tables": { "product": {}, "my_seq": { "type": "sequence" } } }
Vindexes
Every sharded table needs a Primary Vindex (unique, immutable after insert).
| Vindex Type | Use For |
|---|---|
xxhash | Any column type (most common) |
unicode_loose_xxhash | Text columns needing case-insensitive hashing |
binary_md5 | Any column type (MD5-based alternative) |
Choosing a primary vindex column: pick the column most used in high-QPS WHERE clauses, that enables join co-location, keeps transactions single-shard, and has high cardinality.
Co-location Example
{
"sharded": true,
"vindexes": { "xxhash": { "type": "xxhash" } },
"tables": {
"customer": { "column_vindexes": [{ "column": "customer_id", "name": "xxhash" }] },
"orders": { "column_vindexes": [{ "column": "customer_id", "name": "xxhash" }] }
}
}
Both tables shard on customer_id — rows with the same customer_id land on the same shard, enabling single-shard joins and transactions.
Lookup Vindexes
Provide secondary routing to avoid scatter queries:
"customer_email_lookup": {
"type": "consistent_lookup_unique",
"params": { "table": "product.customer_email_lookup", "from": "email", "to": "keyspace_id" },
"owner": "customer"
}
Sequences
Replace MySQL AUTO_INCREMENT for sharded tables (per-shard auto-increment produces duplicates):
CREATE TABLE customer_seq (id BIGINT, next_id BIGINT, cache BIGINT, PRIMARY KEY (id))
COMMENT 'vitess_sequence';
INSERT INTO customer_seq (id, next_id, cache) VALUES (0, 1, 1000);
Sharding Guidelines
- 250 GB per shard is the sweet spot.
- Highest-QPS query's WHERE clause dictates primary vindex.
- Co-locate joined tables; keep transactions local.
- For multi-tenant apps, use multi-column vindexes.
MoveTablescan change sharding keys later.
Troubleshooting Scatter Queries
Check: is WHERE filtering on primary vindex? Is a lookup vindex configured? Use VEXPLAIN PLAN to see routing. Use VEXPLAIN ALL for MySQL query plans, VEXPLAIN TRACE for row-level metrics.
3.3 SQL Compatibility & Limitations
Most MySQL syntax works. Known limitations:
- Stored procedures / triggers / events: Not supported through VTGate.
LOCK TABLES/GET_LOCK: Not supported.SELECT ... FOR UPDATE: Works within a single shard; cross-shard locking is not atomic.- Cross-shard joins: Supported but expensive (scatter-gather).
- AUTO_INCREMENT: Use sequences or app-generated IDs on sharded tables.
- Aggregations on sharded tables: Merge in VTGate memory; large result sets can be slow.
- Foreign keys: Limited. Prefer application-level referential integrity.
Chapter 4: Neki — Sharded Postgres
Note: Neki is not yet a released product but will be available soon.
4.1 Overview
Neki is a sharded Postgres product built by PlanetScale — the company behind Vitess. It brings PlanetScale's expertise in horizontal scaling to the Postgres ecosystem.
- Sharded Postgres — horizontal sharding beyond single-node limits.
- Managed by PlanetScale — proven infrastructure and operational experience.
- High availability — leveraging PlanetScale's track record of availability.
Learn more: neki.dev
4.2 Pre-Sharding Readiness Guide
Prepare your Postgres schema for future horizontal sharding with minimal rework.
Shard Key Design
Choose a shard key now, even if you're not sharding yet:
- Present on every tenant/user-scoped table
- Included in every frequent query's WHERE clause
- High cardinality, evenly distributed, immutable
Common choices: tenant_id, org_id, user_id, account_id.
Primary Keys
-- Single-column PK that IS the shard key
CREATE TABLE users (user_id BIGINT PRIMARY KEY, ...);
-- Composite PK with shard key LEADING
CREATE TABLE orders (
user_id BIGINT NOT NULL,
id BIGINT GENERATED ALWAYS AS IDENTITY,
PRIMARY KEY (user_id, id)
);
-- ❌ WRONG: shard key not leading
PRIMARY KEY (id, user_id)
Co-located Data
Tables frequently joined must share the same shard key:
-- Correct: shard-local join
SELECT o.id, oi.product_id FROM orders o
JOIN order_items oi ON oi.tenant_id = o.tenant_id AND oi.order_id = o.id
WHERE o.tenant_id = $1;
Reference Tables
Small, rarely-changing lookup tables (countries, currencies) get replicated across shards — no shard key needed.
Query Patterns
Every query on sharded tables must include the shard key:
-- ✓ Correct: routed to single shard
SELECT * FROM orders WHERE tenant_id = $1 AND status = 'pending';
-- ❌ Wrong: hits ALL shards
SELECT * FROM orders WHERE status = 'pending';
Indexes
Lead indexes with the shard key. Scope unique constraints to include it:
-- ✓ Correct
CREATE INDEX idx ON orders (tenant_id, status, created_at);
ALTER TABLE orders ADD CONSTRAINT uq UNIQUE (tenant_id, order_number);
-- ❌ Wrong: without shard key
CREATE INDEX idx ON orders (status, created_at);
Foreign Keys
- Within same shard key: may be supported
- Cross-shard-key: must move to application-level enforcement
Transactions
Keep transactions within a single shard key value. Cross-shard = 2PC = slow.
Aggregations
Global COUNT(*), SUM() across shards are expensive. Scope to shard key or use rollup tables.
Shard-Readiness Checklist
- ✅ Shard key present on every tenant-scoped table
- ✅ Composite PKs with shard key leading
- ✅ Shard key in all queries, indexes, and joins
- ✅ Unique constraints scoped to include shard key
- ✅ Cross-shard FKs audited
- ✅ Transactions scoped to single shard key
- ✅ Global aggregations identified; rollup plan in place
- ✅ Migrations avoid long locks
- ✅ Lookup/mapping paths hardened with backfill and monitoring
Chapter 5: Cross-Cutting Best Practices
These patterns apply regardless of which database engine you use.
Primary Key Design
- Default to
BIGINT(auto-increment or identity). - Avoid random UUIDs as clustered/primary keys; prefer time-ordered (UUIDv7, ULID, Snowflake).
- Keep PKs narrow — they're embedded in every secondary index.
Indexing Strategy
- Equality columns first, then range/sort.
- Verify indexes are used with
EXPLAIN/EXPLAIN ANALYZE. - Audit for unused indexes periodically.
- Don't over-index — each index costs write performance.
Query Optimization
- Avoid
SELECT *— list needed columns. - Cursor pagination over
OFFSET. - Avoid functions on indexed columns in WHERE.
- Batch inserts for bulk operations.
Schema Discipline
NOT NULLby default.- Use proper timestamp types (
DATETIMEfor MySQL,TIMESTAMPTZfor Postgres). - Normalize first, denormalize only for measured hot paths.
Operations
- Test DDL on non-production first.
- Use online/non-blocking DDL where available.
- Monitor replication lag.
- Tune connection pooling appropriately.
Sharding Readiness
- Even if you're not sharding today, design with a shard key in mind.
- Keep transactions, joins, and queries scoped to what could be a single shard.
- It's 100x easier to prepare for sharding from day one than retrofit later.
Compiled from PlanetScale's database-skills repository. For the full reference documents with deeper detail on each topic, visit the source repository.