2011.06.20
Solving Sony's (and others') hacking woes
I won't pretend to be any kind of computer security expert and I'm assuming here that the good folks at Sony, Codemasters et al have some pretty robust systems in place to guard against attacks. However, the recent spate of attacks against large gaming firms suggests to me that what man can make, man can break. It would seem that if a group of people want to infiltrate a system and grab the personal details of customers they will. So it strikes me that the only realistic way of making this data secure is to ensure that should it fall in to the wrong hands it would be useless to them.
NB. All code shown here is purely pseudo-code and is for illustration only.
Take the following table as one might see in a database of any company:
| ID | Firstname | Surname | Address | Town | County | Postcode | CreditCard_no | CC_Expires | Password_hash | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Tony | Hancock | 23 Railway Cuttings | East Cheam | Surrey | TUR N1P | tony@example.com | 4921849214 | 12/12 | gh94h849gh89 |
| 2 | Bill | Kerr | 5 High Street | East Cheam | Surrey | RAD 15H | bill@example.com | 8239585795 | 07/13 | whw4hjw9hhjw |
| 3 | Sidney | James | Cell B2 | Wormwood scrubs | London | PR1 50N | sid@example.com | 5748357845 | 08/11 | h4wh9wh94ww |
| 4 | Kenneth | Williams | 7 The Avenue | Mitcham | Surrey | MI3 9TR | ken@example.org | 5487583475 | 11/14 | h49wh9w9hh94 |
Table: Customers
In the event an intruder should find this table they have all they need. Even encrypted, it isn't 100% safe. So I propose a method for ensuring that even if this information falls in to the wrong hands it is essentially useless to them. I appreciate what I am about to outline will have big performance overheads, but maybe that's the price that needs to be paid. Maybe the most commonly accessed information could be kept to one or two tables only.
I suggest splitting this information across three or more tables and using a hashing algorithm to link them. By also generating the unique ID instead of using auto-increment one can remove any chance of sorting and cross-referencing. So to use the example above, it would now look as follows:
| ID | Firstname | Postcode | CC_Expires |
|---|---|---|---|
| 437824 | Tony | TUR N1P | 12/12 |
| 763124 | Bill | RAD 15H | 07/13 |
| 587623 | Sidney | PR1 50N | 08/11 |
| 435645 | Kenneth | MI3 9TR | 11/14 |
Table: Customers_A
| ID | Address | Password_hash | |
|---|---|---|---|
| 23 Railway Cuttings | tony@example.com | gh94h849gh89 | |
| 5 High Street | bill@example.com | whw4hjw9hhjw | |
| Cell B2 | sid@example.com | h4wh9wh94ww | |
| 7 The Avenue | ken@example.org | h49wh9w9hh94 |
Table: Customers_B
| ID | Surname | Town | County | CreditCard_no |
|---|---|---|---|---|
| Hancock | East Cheam | Surrey | 4921849214 | |
| Kerr | East Cheam | Surrey | 8239585795 | |
| James | Wormwood scrubs | London | 5748357845 | |
| Williams | Mitcham | Surrey | 5487583475 |
Table: Customers_C
You'll notice I left the ID fields blank in tables 'Customer_B' and 'Customer_C'. I propose that these be populated using a hash of information from a preceding table, salted with other information.
For example, using MD5 (yes, I know it has flaws, this is just an example) to hash the postcode, salted with the firstname reversed:
Customers_B.ID = MD5(string_reverse(Customers_A.Firstname) + Customers_A.Postcode);
Giving us:
| ID | Address | Password_hash | |
|---|---|---|---|
| seijgiegjegj6e | 23 Railway Cuttings | tony@example.com | gh94h849gh89 |
| ejgiegijiegjeig | 5 High Street | bill@example.com | whw4hjw9hhjw |
| xhfyegyeeyey | Cell B2 | sid@example.com | h4wh9wh94ww |
| oegjegjoegjoo | 7 The Avenue | ken@example.org | h49wh9w9hh94 |
Table: Customers_B
We could then use fields from 'Customers.B' to populate the ID field of 'Customers_C':
Customers_C.ID = MD5(string_reverse(Customers_B.Email) + Customers_B.Address);
Or even use a combination of fields from all other tables:
Customers_C.ID = MD5(string_reverse(Customers_A.CC_Expires) + Customers_B.Password_hash);
Giving us:
| ID | Surname | Town | County | CreditCard_no |
|---|---|---|---|---|
| vogjoejogeojh | Hancock | East Cheam | Surrey | 4921849214 |
| ceophgepjpea | Kerr | East Cheam | Surrey | 8239585795 |
| 4hjo4shjo4hj | James | Wormwood scrubs | London | 5748357845 |
| hh04j0h4jh0j | Williams | Mitcham | Surrey | 5487583475 |
Table: Customers_C
One major benefit of this is that even if the tables were compromised and sorted by the ID field, they would not match up. Here are the three tables again, but this time sorted by the ID field. You'll notice that the first record of 'Customers_A' no longer corresponds to the first record of 'Customers_B', and neither correspond to the first record of 'Customers_C'. Of course there will be occasions when by coincidence they do correspond but there will be no way of knowing this without knowing how the hashing algorithms work, what data is used and how (reversing, salting etc).
| ID | Firstname | Postcode | CC_Expires |
|---|---|---|---|
| 435645 | Kenneth | MI3 9TR | 11/14 |
| 437824 | Tony | TUR N1P | 12/12 |
| 587623 | Sidney | PR1 50N | 08/11 |
| 763124 | Bill | RAD 15H | 07/13 |
Table: Customers_A
| ID | Address | Password_hash | |
|---|---|---|---|
| ejgiegijiegjeig | 5 High Street | bill@example.com | whw4hjw9hhjw |
| oegjegjoegjoo | 7 The Avenue | ken@example.org | h49wh9w9hh94 |
| seijgiegjegj6e | 23 Railway Cuttings | tony@example.com | gh94h849gh89 |
| xhfyegyeeyey | Cell B2 | sid@example.com | h4wh9wh94ww |
Table: Customers_B
| ID | Surname | Town | County | CreditCard_no |
|---|---|---|---|---|
| 4hjo4shjo4hj | James | Wormwood scrubs | London | 5748357845 |
| ceophgepjpea | Kerr | East Cheam | Surrey | 8239585795 |
| hh04j0h4jh0j | Williams | Mitcham | Surrey | 5487583475 |
| vogjoejogeojh | Hancock | East Cheam | Surrey | 4921849214 |
Table: Customers_C
Can anyone see any flaws in my thinking?
arpanet
Witty rejoinders
Paul
2011.08.29, 12:19
p.s. the 'Website' link in the comments section prefixes everything with the URL for your site :o)
| (never published) | |
| http:// | |
| Captcha: | |

Paul
2011.08.29, 12:17
www.internet-tools.co.uk/blog
Whilst I agree that the weakness is the hashed password in the original table, you're not making it harder to dehash that really. That stays the same, they just need to dehash the relative ID's, which given that they are all numeric, could probably be brute-forced in a fairly short space of time... Wouldn't it be better just to have a stupidly long private key to hash the passwords against?