SQL from Scratch: History, Theory, and Why It Remains the Language of Data

Few technologies have had such a long shelf life — and such a quietly pervasive influence — as SQL. While programming languages come and go with the tides of fashion, the Structured Query Language remains, more than half a century after its conception, the standard way that humans and applications ask questions of data.

Behind virtually any modern website, mobile application, banking system, or analytics dashboard, there is almost certainly a relational database and an engine that understands SQL. Understanding where SQL comes from, why it exists, what theoretical model underpins it, and what problem it set out to solve makes everything that follows — SELECT, JOIN, indexes, transactions — far more intuitive.

A 1960s Problem: Data Trapped Inside Its Own Shape

To understand why SQL was born, it helps to remember what databases looked like before it. In the 1960s, the dominant systems were the hierarchical model (such as IBM's IMS) and the network model (such as CODASYL-based systems). In both, data was organized in trees or pointer graphs, and to retrieve information a programmer had to navigate that structure explicitly: open a record, follow a pointer to the next, jump to another file, and so on.

This had two serious consequences. First, application code became tightly coupled to the physical way data was stored on disk — changing the storage structure meant rewriting programs. Second, accessing data required expert knowledge of the database's internal "map," making it practically impossible for non-programmers to ask questions about their organization's own information.

1970: Edgar F. Codd and the Relational Model

The breakthrough came from IBM's Research Laboratory in San Jose, California. In June 1970, British mathematician Edgar Frank "Ted" Codd published in Communications of the ACM what is now considered one of the founding texts of computer science: A Relational Model of Data for Large Shared Data Banks.

Codd's proposal was, on the surface, simple and, at its core, revolutionary: data should be represented as mathematical relations — sets of typed tuples with named attributes — in other words, tables with rows and columns. Users and applications should be able to query and manipulate that data without knowing anything about how it is physically stored on disk, using a non-procedural data sublanguage.

This idea — called data independence — is something we take for granted today, but at the time it met strong resistance, even within IBM itself, which took almost eight years to turn the model into a commercial product. For his contributions to the relational model, Codd received the Turing Award in 1981.

From SEQUEL to SQL: Chamberlin, Boyce, and System R

Codd's model was powerful, but its notation — based on relational algebra and tuple relational calculus, complete with quantifiers and Greek symbols — was intimidating for anyone without a mathematical background. Two young IBM researchers, Donald D. Chamberlin and Raymond F. Boyce, set out to design a more approachable way to express the same ideas.

In 1974 they presented at the ACM SIGFIDET conference (now SIGMOD) the paper "SEQUEL: A Structured English Query Language," proposing a language built around English keywords — SELECT, FROM, WHERE — that could be translated into relational model operations with expressive power equivalent to first-order predicate calculus.

The original name, SEQUEL (Structured English QUEry Language), had to be shortened to SQL because "Sequel" was already a registered trademark of the British aircraft manufacturer Hawker-Siddeley. That anecdote explains why many professionals — including Chamberlin himself — still pronounce the acronym as "sequel" to this day. The language was implemented within IBM's System R project, a prototype that demonstrated the relational model could sustain a production-grade database system.

By the late 1970s, a company then called Relational Software Inc. — today Oracle Corporation — recognized the commercial potential of the model and launched in 1979 the first commercial SQL-based RDBMS, Oracle V2, beating IBM to market.

Figure 1 — Key milestones in the history of SQL, from Codd's relational model (1970) to the current standard (SQL:2023).

Standardization: ANSI, ISO, and the Major Versions

As different vendors began shipping their own SQL dialects, the need for a common standard became obvious. Two standardization bodies took charge: ANSI (American National Standards Institute), which adopted the first standard in 1986 as ANSI X3.135, and ISO (International Organization for Standardization), which adopted it in 1987 as ISO/IEC 9075.

Version	Year	Key contributions
SQL-86 / SQL-87	1986–87	First version standardized by ANSI and ISO.
SQL-89	1989	Referential integrity: primary and foreign keys, `CHECK`, `DEFAULT`.
SQL-92	1992	Major revision. Explicit `JOIN` (LEFT, RIGHT, FULL), DATE/TIME types, set operations.
SQL:1999 (SQL3)	1999	User-defined types, triggers, regular expressions, recursion (`WITH RECURSIVE`).
SQL:2003	2003	Window functions (`OVER`), `MERGE`, XML support, auto-generated types.
SQL:2011	2011	Temporal tables with time-based versioning, standardized `TRUNCATE`.
SQL:2016	2016	Native JSON, row pattern recognition, polymorphic table functions.
SQL:2023	2023	Property graph queries (SGQL/GQL), JSON refinements.

What Is SQL, Conceptually?

SQL is a declarative, domain-specific language for managing data in relational systems. The two key words in that definition are declarative and relational.

Declarative means the programmer describes what result they want, not how to compute it. Where a procedural language like C or Python requires loops and explicit memory management, SQL simply states something like: "give me all customers from New York ordered by account age." The database engine's query optimizer then decides the concrete strategy: which index to use, in what order to join tables, whether to parallelize the work.

Relational Databases in One Minute

Figure 2 — Two tables linked by a primary key (PK) and a foreign key (FK).

The Four Families of SQL Commands

DDL Data Definition Language CREATE · ALTER
DROP · TRUNCATE

Defines the schema structure: which tables exist and what columns they contain.

DML Data Manipulation Language SELECT · INSERT
UPDATE · DELETE

Reads and modifies the data stored in the tables.

DCL Data Control Language GRANT · REVOKE

Manages permissions and access control over database objects.

TCL Transaction Control Language BEGIN · COMMIT
ROLLBACK · SAVEPOINT

Controls transactions and enforces ACID guarantees.

The Major Engines That Speak SQL

PostgreSQL

Open Source

A free, academically oriented engine with strong standards conformance. Renowned for its extensibility: custom types, native JSON, geospatial search via PostGIS.

MySQL

Oracle (GPL)

The world's most popular open-source RDBMS. The reference engine for the web (LAMP stack), pragmatic and performance-oriented.

SQLite

Embedded

A serverless C library that stores everything in a single file. Present on every Android and iOS device; arguably the most widely deployed SQL engine on the planet.

SQL Server

Microsoft

Microsoft's enterprise RDBMS, featuring the T-SQL dialect, deep .NET ecosystem integration, and advanced BI and analytics capabilities.

Oracle DB

Oracle

The commercial pioneer since 1979. The historical benchmark for mission-critical enterprise workloads, with PL/SQL as its native procedural language.

Worth mentioning as well: MariaDB (a community fork of MySQL), IBM Db2 (the direct descendant of the original System R), and modern analytical engines such as Snowflake, BigQuery, and Amazon Redshift, which have extended the SQL paradigm into cloud-based data warehousing.

Main References

Codd, E. F. (1970). A Relational Model of Data for Large Shared Data Banks. Communications of the ACM, 13(6), 377–387.
https://dl.acm.org/doi/10.1145/362384.362685
Chamberlin, D. D., & Boyce, R. F. (1974). SEQUEL: A Structured English Query Language.
https://dl.acm.org/doi/10.1145/800296.811515
PostgreSQL Global Development Group. PostgreSQL Documentation — Appendix D: SQL Conformance.
https://www.postgresql.org/docs/current/features.html
American National Standards Institute. The SQL Standard — ISO/IEC 9075:2023.
https://blog.ansi.org/ansi/sql-standard-iso-iec-9075-2023-ansi-x3-135/