Lei Kristoffer R. Lactuan and Jaderick P. Pabico
Institute of Computer Science, University of the Philippines Los Baños, College 4031, Laguna
{lkrlactuan, jppabico}@up.edu.ph
Date Received: May 23, 2015; Date Revised: July 8, 2015
Unshredding of Shredded Documents Computational Framework and Implementation 609 KB 1 downloads
Lei Kristoffer R. Lactuan and Jaderick P. Pabico Institute of Computer Science, University...
A shredded document D is a document whose pages have been cut into strips for the purpose of destroying private, confidential, or sensitive information I contained in D. Shredding has become a standard means of government organizations, businesses, and private individuals to destroy archival records that have been officially classified for disposal. It can also be used to destroy documentary evidence of wrongdoings by entities who are trying to hide I (e.g., as alleged by the whistle blowers in the P10-Billion pork barrel-JLN NGOs scam). Shredding does not really destroy ID but just simply jumbles I by cutting the pages of D into strips. Let D be a set of n ordered pages {p0, p1, …, pn–1}, and let each ith (X×Y)-dimension page p0≤i<nD contains a 2-dimensional ordered arrangement of the information elements Ji,(x,y){0,1}, 0≤x<X, 0≤y<Y. When each Ji,(x,y) is presented to a reader R in correct order, then I is transferred to R. When pi is shredded into m vertical strips si,0, si,1, …, si,m–1, then the jth strip si,0≤j<m contains X/m×Y information elements, which has Ji,(a(j),0) in its upper left corner and Ji,(b(j),Y–1) in its lower right, where a(r)=(x–m)r/m, and b(r) = (r+1)x/m. However, what we are interested in are the information elements Ji,(a(j),0≤y<Y) and Ji,(b(j),0≤y<Y) along the left and right edges, respectively. If two strips si,p and si,q are actually adjacent to each other in the unshredded pi, then either Ji,b(p),0≤y<Y) ≡ Ji,(a(q),0≤y<Y) or Ji,(a(p),0≤y<Y) ≡ Ji,(b(q),0≤y<Y), where ≡ is a similarity function that we defined. In this paper, we present an optimal O((n×m)2) algorithm A that reconstructs an n-page D, where each page p is shredded into m strips. We also present the efficacy of A in reconstructing three document types: hand-written, machine typed-set, and images.
Keywords – shredding, unshredding, reconstruction, optimal algorithm, documents