Ensuring Data Storage Security in
Cloud Computing
Cong Wang
1
, Qian Wang
1
, Kui Ren
1
, and Wenjing Lou
2
1
ECE Department, Illinois Institute of Technology
2
ECE Department, Worcester Polytechnic Institute
Outline
Cloud Computing and Its Security Challenges
Data Storage Security in Cloud Computing
Our Approach
Evaluation
Concluding Remarks
Cloud Computing
Background
Cloud computing has been envisioned as the next-
generation architecture of IT enterprise.
on-demand self-service, ubiquitous network access, location independent resource
pooling, rapid resource elasticity, usage-based pricing and transference of risk
Prediction from Market-research firm IDC, cloud-computing revenue will increase from US $16.2 billion to
42.3 billion during the next few years.
Image from: Neal Leavitt, "Is Cloud Computing Really Ready for
Prime Time?," Computer, vol. 42, no. 1, pp. 15-20, January, 2009.
Cloud Computing
Background
Along with the coming of Cloud Computing is its untested
deployment, correlated adversarial models and
vulnerabilities:
Secure resource virtualization
Practical integrity mechanisms for data outsourcing
Secure computation outsourcing
Business and security risk models and clouds
Secure data management outsourcing
and many……
It is imperative that our community gets involved at
this early stage and do it right for the first time!
Outline
Cloud Computing Background
Data Storage Security in Cloud Computing
Our Approach
Evaluation
Concluding Remarks
Overview for Data
Storage in Cloud
From user’s perspective, data outsourcing brings:
Relief of the burden for storage management
universal access to data, independent of location
lower capital expenditure (CapEx) on hardware, software and services
Data outsourcing also eliminates users’ ultimate control over the fate
of their data.
Shall We Trust the Cloud
for data integrity?
Broad range of threats for data integrity still exist:
Internal: Byzantine failure, management errors, software bugs etc.
External: malicious malware, economically motivated attacks etc.
Motivation for the Cloud service providers to cheat:
Discard rarely accessed data for monetary reason
Hide data loss incident for reputation.
While cloud data storage is economically attractive for the
costs and complexity of long-term large-scale data storage,
it doesn’t offer guarantees on data integrity and availability.
Problem Description
Users should be equipped with security means so that they can
make continuous correctness assurance of their stored data.
Data integrity auditing tasks, if necessary, can be delegated to
an optional Third Party Auditor (TPA).
Challenges for ensuring
data integrity in Cloud
Traditional crypto primitives can not be directly adopted.
No local copy of data at user side.
Retrieving large amount data for checking is unpractical.
I/O burden on both servers and user, Huge network traffic, Expensive services
charge, by byte of I/O and byte transferred
Data dynamics should be considered
Cloud is not just a data warehouse: data may be frequently updated.
Most previous work on remote data integrity do not support data dynamics
Distributed protocols for storage correctness is demanded
Cloud is powered by data centers running in a simultaneous,
cooperated and distributed manner
Most previous work on distributed data storage only provide binary results for the
storage correctness.
Design Goals
Storage Correctness Verification
Distributed protocol for storage correctness assurance
Fast Data Error Localization (outperform the binary result)
Identifying misbehaving server(s)
Explicit Dynamic Data Operation Support
Data modification, deletion and append are considered
Dependability
Minimize the effect brought by data errors or server failures
Efficiency
Outline
Cloud Computing Background
Data Storage Security in Cloud Computing
Our Approach
Ensuring Cloud Data Storage
Supporting Data Dynamics
Evaluation
Concluding Remarks
Ensuring Cloud Data Storage
We rely on a (m + k, k) Reed-Solomon erasure-correcting code to disperse the
data file F redundantly across a set of n = m + k distributed servers.
The systematic layout with parity vectors is achieved with the information
dispersal matrix A:
(1) (2) ( ) ( 1) ( )
( | ) ( ) ( | )
()
m m m k
F F F
G G G G G

1 2 m
G F A F I P , I P
, , ,

f
11
f
12
f
13
... ... f
53
f
11
f
21
f
31
f
41
f
51
f
12
f
22
f
32
f
42
f
52
f
13
f
23
f
33
f
43
f
53
F
1
F
2
F
3
f
11
f
21
f
31
f
41
f
51
f
12
f
22
f
32
f
42
f
52
f
13
f
23
f
33
f
43
f
53
f
14
f
24
f
34
f
44
f
54
f
15
f
25
f
35
f
45
f
55
Original file block Parity block
G
(1)
G
(2)
G
(3)
G
(4)
G
(5)
A Reed-Solomon Codeword
Ensuring Cloud Data Storage
Based on the codeword relationship, we can verify the correctness of data block
in each “row” via information dispersal matrix A (or P).
Drawbacks: 1. need block retrieval at first, which is proportional to vector length.
2. large communication overhead.
3. only binary result about the storage state.
f
11
f
12
f
13
... ... f
53
f
11
f
21
f
31
f
41
f
51
f
12
f
22
f
32
f
42
f
52
f
13
f
23
f
33
f
43
f
53
F
1
F
2
F
3
f
11
f
21
f
31
f
41
f
51
f
12
f
22
f
32
f
42
f
52
f
13
f
23
f
33
f
43
f
53
f
14
f
24
f
34
f
44
f
54
f
15
f
25
f
35
f
45
f
55
Original file block Parity block
G
(1)
G
(2)
G
(3)
G
(4)
G
(5)
A Reed-Solomon Codeword
31 32 33 34 35
( , , ) ( , )f f f P f f
Can we do better?
Ensuring Cloud Data Storage
Random sampling + homomorphic token pre-computation(linear combination)
f
11
f
21
f
31
f
41
f
51
f
12
f
22
f
32
f
42
f
52
f
13
f
23
f
33
f
43
f
53
f
14
f
24
f
34
f
44
f
54
f
15
f
25
f
35
f
45
f
55
Original file block Parity block
G
(1)
G
(2)
G
(3)
G
(4)
G
(5)
α
α
2
α
3
v
(1)
v
(2)
v
(3)
v
(4)
v
(5)
Pre-computed token
R
(1)
R
(2)
R
(3)
R
(4)
R
(5)
Server Response
1. Check if it is a valid a Reed-Solomon
Codeword. If not, go to step 2.
2. Check if R
(j)
=v
(j)
(j = 1, …,5 ) to
identify the misbehaving server(s).
3
( ) ( )
1
* [ ],{ 1,3,5} {1,...,5}
j q j
qq
q
v G I I and j
(1) (2) (3) (4) (5)
( , , ) ( , )R R R P R R
Advantages: 1. only small constant block retrieval is required
2. Finding misbehaving server(s)
3. Efficiency
Outline
Cloud Computing Background
Data Storage Security in Cloud Computing
Our Approach
Ensuring Cloud Data Storage
Supporting Data Dynamics
Evaluation
Concluding Remarks
Supporting Data Dynamics
Cloud data storage is not only for archive purpose
General block-level operations: update, delete,
append…
Trivial way is to download all the data from the cloud
servers and re-compute parity blocks and tokens
Can we do better?
Supporting Data Dynamics
()
*
F A F F A F A F A
Logical representation of data dynamics,
including block update, append and delete
0 0 0 f
61
f
62
f
63
f
61
f
62
f
63
f
11
f
21
f
31
f
41
f
51
f
12
f
22
f
32
f
42
f
52
f
13
f
23
f
33
f
43
f
53
f
11
f
21*
f
31
f
41
f
51
f
12*
f
22
f
32
f
42
f
52*
f
13
f
23
f
33
f
43*
f
53
Original file block Modified block
F
1
F
2
F
3
0
Δf
21
0
0
0
Δf
12
0
0
0
Δf
52
0
0
0
Δf
43
0
ΔF
1
ΔF
2
ΔF
3
+
=
F
1*
F
2*
F
3*
Appended block
Due to the linear property of Reed-Solomon code, we
can amend” the parity blocks, without involving any of
the unchanged blocks.
Supporting Data Dynamics
Similarly: we can “amend” the tokens, without
retrieving any of the unchanged blocks.
Suppose a block G
(j)
[I
s
], which is covered by the specific
token v
(j)
, has been changed:
The token v
(j)
can be updated:
( ) ( ) ( )
[ ] [ ] [ ]
j j j
s s s
G I G I G I
( ) ( ) ( )
[ ] { }.
j j s j
s
v v G I s q
( ) ( )
1
* [ ],{ [1,..., ]|1 }
r
j q j
qq
q
v G I I l q r
Outline
Cloud Computing Background
Data Storage Security in Cloud Computing
Our Approach
Ensuring Cloud Data Storage
Supporting Data Dynamics
Evaluation
Concluding Remarks
Detection Probability: Assume the adversary modifies
or deletes the data blocks in z rows out of the total l
rows in the encoded file matrix.
Each time we samples r rows to check, the detection
probability will be:
Security Analysis
If z/l = 1% and r = 460, the detection probability P
d
is at least 99%.
1
0
1 (1 min{ ,1}) 1 ( )
r
r
d
i
z l z
p
l i l
Colluding Attack Resistance: Can we hide the secret
encoding matrix P without affecting the validity of the
checking results?
Yes! Make use of the linear property of Erasure
Correcting Coding.
Adding random perturbations to the encoded file
matrix and hence hide the secret matrix P.
The linear property of RS-code makes random perturbations easily
stripped away for verification purposes.
Security Analysis
File Pre-distribution Cost
Token Pre-computation Cost
To verify the data once per day for the next 5 years, the
average token pre-computation cost according to our
implementation is 51.97s per data vector, given r = 460.
Performance Evaluation
Set I
m=4
m=6
m=8
m=10
k = 2
567.45s
484.55s
437.22s
414.22s
Set II
k=1
k=2
k=3
k=4
m=8
358.90s
437.22s
584.55s
733.34s
The cost of parity generation in seconds for an 8GB data file on Intel Core 2 processor running at 1.86GHz.
For set I, the number of parity servers k is fixed; for Set II, the number of data servers m is constant.
Outline
Cloud Computing Background
Data Storage Security in Cloud Computing
Our Approach
Ensuring Cloud Data Storage
Supporting Data Dynamics
Evaluation
Concluding Remarks
Concluding Remarks
1) Instead of giving only binary results about the
storage state across the distributed servers, our work
further provides the localization of data error.
2) In addition to ensuring cloud data integrity, the new
scheme supports secure and efficient dynamic
operations on data blocks, including: update, delete
and append.
3) Extensive security and performance analysis shows
that the proposed scheme is highly efficient and
resilient under various untrusted server scenarios.
Thank You!