This series is not about SingleStore only. I'll test many NewSQL database to see how they support the basic SQL features with the simple EMP/DEPT schema, with a focus on Foreign Keys and Unique Secondary Indexes, common in OLTP System of Records. In the first post we have seen that SingleStore doesn't support referential integrity when distributed, and in the second post, I've explained the possible alternatives to enforce them from the application, with an example on YugabyteDB which supports all. The current post completes the first one to see which alternative is possible. The SingleStore Forum answer to "How to ensure strong consistency in MemSQL?" is "Referential integrity could be enforced by the application." so let's check what is actually possible.
I'm starting the same container as in the previous post and create the tables without any unique or foreign key constrains, as they are not supported:
CREATE rowstore TABLE dept (
deptno integer NOT NULL,
dname text,
loc text,
description text,
CONSTRAINT pk_dept PRIMARY KEY (deptno asc)
);
CREATE rowstore TABLE emp (
empno integer auto_increment NOT NULL,
ename text NOT NULL,
job text,
mgr integer,
hiredate date,
sal integer,
comm integer,
deptno integer NOT NULL,
email text,
other_info json,
CONSTRAINT pk_emp PRIMARY KEY (empno)
-- CONSTRAINT emp_email_uk UNIQUE (email),
-- CONSTRAINT fk_deptno FOREIGN KEY (deptno) REFERENCES dept(deptno),
-- CONSTRAINT fk_mgr FOREIGN KEY (mgr) REFERENCES emp(empno)
);
Serializable doesn't work
My demo example has a department DEPT.DEPTNO=40
with no employees. I insert one employee in it:
singlestore> set transaction isolation level serializable;
Query OK, 0 rows affected (0.04 sec)
singlestore> start transaction;
Query OK, 0 rows affected (0.03 sec)
singlestore> insert into emp(ename, deptno) values ('Franck',40);
Query OK, 1 row affected (0.14 sec)
I didn't commit yet. At the same time, another session queries the EMP table and sees no employee for EMP.DEPTNO=40
and then decides to delete this empty department:
singlestore> set transaction isolation level serializable;
Query OK, 0 rows affected (0.03 sec)
singlestore> start transaction;
Query OK, 0 rows affected (0.03 sec)
singlestore> SELECT @@TX_ISOLATION;
+----------------+
| @@TX_ISOLATION |
+----------------+
| SERIALIZABLE |
+----------------+
1 row in set (0.05 sec)
select count(*) from emp where deptno=40;
Query OK, 0 rows affected (0.04 sec)
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.07 sec)
singlestore> delete from dept where deptno=40;
Query OK, 1 row affected (0.31 sec)
singlestore> commit;
Query OK, 0 rows affected (0.03 sec)
This is the basic of SQL transaction isolation, the I in ACID, where one session cannot see what the others are doing until they commit their changes. If the Serializable isolation level was working, the first session should not be able to commit its insert because those two transactions are not serializable. They both read a state that conflicts with the other's final state. However, SingleStore allows me to commit the first transaction:
singlestore> commit;
Query OK, 0 rows affected (0.04 sec)
singlestore> select deptno, dname from dept;
+--------+------------+
| deptno | dname |
+--------+------------+
| 20 | RESEARCH |
| 30 | SALES |
| 10 | ACCOUNTING |
+--------+------------+
3 rows in set (0.09 sec)
singlestore> select empno, ename, deptno from emp where ename='Franck';
+------------------+--------+--------+
| empno | ename | deptno |
+------------------+--------+--------+
| 1125899906842626 | Franck | 40 |
+------------------+--------+--------+
1 row in set (0.07 sec)
My database is now logically corrupt: I have an employee in EMP.DEPTNO=40
but no department DEPT.DEPTNO=40
. If the application queries only EMP
it will see this employee. If it joins with the parent table, this employee will be invisible.
The reason, as far as I understand, is that the Serializable isolation level is not supported in SingleStore. However, it raises no error. Then, I guess many application developers apply the "_Referential integrity could be enforced by the application. _" and think it works (who covers all race conditions in his CI tests?). Until one day, those conditions happen in production a corrupt data.
Lock doesn't scale
I add again the empty department to fix the error:
insert into dept(deptno) values (40);
An alternative to Serializable is pessimistic locking. The most scalable is to lock the parent row in share mode, but this is not supported in SingleStore:
singlestore> start transaction;
Query OK, 0 rows affected (0.03 sec)
singlestore> select * from dept where deptno=40 for share;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'share' at line 1
singlestore>
Locking the parent row in exclusive mode is possible with SELECT ... FOR UPDATE
:
singlestore> start transaction;
Query OK, 0 rows affected (0.03 sec)
singlestore>
singlestore> select * from dept where deptno=40 for update;
+--------+-------+------+-------------+
| deptno | dname | loc | description |
+--------+-------+------+-------------+
| 40 | NULL | NULL | NULL |
+--------+-------+------+-------------+
1 row in set (0.04 sec)
singlestore>
singlestore> insert into emp(ename, deptno) values ('Franck',40);
Query OK, 1 row affected (0.03 sec)
This guarantees that no other transaction can delete the parent row until I commit mine, and then, they will see that there is a child row. Or the lock acquisition will timeout:
singlestore> start transaction;
Query OK, 0 rows affected (0.03 sec)
singlestore>
singlestore> select count(*) from emp where deptno=40;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.03 sec)
singlestore> delete from dept where deptno=40;
ERROR 1205 (HY000): Leaf Error (node-54f20996-9bb5-4d06-8308-4d84ec42fbf2-leaf-ag2-0.svc-54f20996-9bb5-4d06-8aab-b242130227f6:3306): Lock wait timeout exceeded; try restarting transaction. Row lock owned by connection id 163, query `open idle transaction`
This is correct for data consistency, but not for scalability. Locking the parent in exclusive mode each time we want to insert a child row means that those inserts will not scale:
singlestore> start transaction;
Query OK, 0 rows affected (0.03 sec)
singlestore> select * from dept where deptno=40 for update;
ERROR 1205 (HY000): Leaf Error (node-54f20996-9bb5-4d06-8308-4d84ec42fbf2-leaf-ag2-0.svc-54f20996-9bb5-4d06-8aab-b242130227f6:3306): Lock wait timeout exceeded; try restarting transaction. Row lock owned by connection id 163, query `open idle transaction`
In practice, this is possible only with a composition where you cannot expect concurrent DML for the same parent, like Orders and Orders Items for example, but in a scalable database this is probably stored as one JSON document.
The last possibility, locking the child tables in exclusive when deleting a parent is not scalable and, anyway, doesn't work in SingleStore:
singlestore> lock tables emp write;
Query OK, 0 rows affected, 1 warning (0.00 sec)
I don't really know if this is supposed to work or not. It says Query OK
but 1 warning
and I don't see any warning. However, I've run this while another transaction did insert into emp(ename, deptno) values ('Franck',40);
and it doesn't wait or fail so probably no lock was acquired. Easy to check by deleting the parent:
singlestore> delete from dept where deptno=40;
Query OK, 1 row affected (0.00 sec)
singlestore> -- Here I commit the other session
singlestore> select * from dept right outer join emp using(deptno) where deptno=40;
+--------+-------+--------+------+------+----------+------+------+-------+------------+-------+------+-------------+
| deptno | empno | ename | job | mgr | hiredate | sal | comm | email | other_info | dname | loc | description |
+--------+-------+--------+------+------+----------+------+------+-------+------------+-------+------+-------------+
| 40 | 3 | Franck | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+--------+-------+--------+------+------+----------+------+------+-------+------------+-------+------+-------------+
1 row in set (0.16 sec)
Again, this ends with corrupt data. Distributed or not, a relational database should guarantee referential integrity or you will get wrong result will come when joining data after some time in real life production. It can be acceptable to skip this for a datawharehouse with no concurrent modifications, but OLTP requires those basic SQL features. I've no experience in SingleStore and I'm not an expert in MySQL, so please comment if I missed something here. For the moment, I think that "Referential integrity could be enforced by the application" is impossible in a database that does not support Serializable isolation level, or at least share mode row lock.
By the way, I think SingleStore is great for analytics and is not the only one NewSQL without support for cross-shard integrity. In general, the lack of referential integrity is quite common in the MySQL-compatible databases. The PostgreSQL ones are more SQL compatible, and probably a better choice for OLTP. But no generalization. In the next posts, I will look at other databases and their support for SQL properties (this is about the C and I in ACID).