Zip and Unzip files programmatically
Introduction
Recently one of the reader of DotNetBips.com posted a question on the discussion forums about
compressing and decompressing files via code. There is no obvious
answer to this question because C# and VB.NET lack this functionality. However, J#
does have a way to zip and unzip files programmatically. In this
article I am going to explain how this can be achieved. In this
article you will develop a reusable class library that can be used to create,
extract and alter ZIP files. Once developed you can use it in your Windows or
web applications.
Background
In some applications it is needed that files (they may
include documents, XML files or any other type of files) be compressed on the fly
and ZIP file be created. For example, a typical requirement is that users
should be able to select files and selected files should be downloaded as a
single ZIP file on client machine. There is no out of the box solution for this
requirement in C# and VB.NET. Developers often turn to the following
alternatives:
-
Use a third party component
-
Use some open source component
-
Implement ZIP algorithm manually
The first option requires extra licensing cost and many
times developers are reluctant to use third party black box components in their
applications. The second option is certainly attractive as you get the complete
source code of the component. However, licensing, bugs (if any),
upgrades and support are still a big issue there. Finally, the last option
is difficult and requires reasonable efforts from developer's end.
Luckily, J# (which is a part of overall .NET
infrastructure) provides a handy way to compress and decompress files via code.
The advantages of using J# compression features are:
-
J# is a part of overall .NET
infrastructure
-
As J# is provided by Microsoft future upgrades and
support is assured
-
No need to use any third party
component
Considering this it makes sense to use J# features
to compress and decompress files programmatically and that is what I
am going to illustrate.
Creating a class library
We will create a C# class library that will
internally consume J# classes for compressing and decompressing files. This way
once the library is developed any C# or VB.NET developer can consume it. To
begin with create a new class library project. Add a class to it called
ZipFileHelper. To use J# compression classes you must refer vjslib.dll assembly.
The following figure shows the Add Reference dialog of Visual Studio with this
assembly selected.
Once the reference is added to vjslib.dll, you also need
to import the following namespaces:
-
java.util;
-
java.util.zip;
-
java.io;
The java.util namespace contains some utility classes.
The java.util.zip namespace is the core namespace that contains classes related to
ZIP file creation. Finally, the java.io namespace provides some classes related
to file IO. The classes that we will use from the
above namespaces are :
-
ZipFile
-
ZipEntry
-
InputStream
-
OutputStream
-
FileInputStream
-
FileOutputStream
-
ZipOutputStream
-
Enumeration
The ZipFile class is a programmatic representation of a ZIP
file. A ZipFile contains zero or more ZipEntry objects and actual
content of the zipped files. Each entry is nothing but metadata
about a zipped file.
The InputStream,
OutputStream, FileInputStream and FileOutputStream classes represent streams pointing to in-memory and
file based streams respectively.
The ZipOutputStream class represents a writable stream pointing to a ZIP
file. This stream can be used to write ZipEntry objects and content
to the ZIP file.
Finally, the Enumeration class is J#
way to represent collections.
Creating ZIP files
Before we actually write code to create or extract ZIP files let's create some
helper methods that we need later. We need to create the following
helper methods:
Obtaining a list of items inside
a ZIP file
The GetZippedItems() method returns a
generic List of ZipEntry objects from a ZipFile. The GetZippedItems() method is
shown below:
private static List<ZipEntry> GetZippedItems(ZipFile file)
{
List<ZipEntry> entries = new List<ZipEntry>();
Enumeration e = file.entries();
while (true)
{
if (e.hasMoreElements())
{
ZipEntry entry = (ZipEntry)e.nextElement();
entries.Add(entry);
}
else
{
break;
}
}
return entries;
}
The GetZippedItems() method accepts a ZipFile object and
returns a generic List of ZipEntry objects. The method creates a generic collection
of ZipEntry type. It then calls entries() method of
ZipFile class to return an Enumeration of ZipEntry objects. The code then iterates through the
enumeration and populates the List. Finally, the populated List is returned to
the caller.
Copying streams
While adding or
removing files from an existing ZIP file we need
to copy contents of constituent files from source to destination streams. Hence, we need a
helper method called CopyStreams() to do that job. The CopyStreams() method is
shown below:
private static void CopyStream(InputStream source,
OutputStream destination)
{
sbyte[] buffer = new sbyte[8000];
int data;
while (true)
{
try
{
data = source.read(buffer, 0, buffer.Length);
if (data > 0)
{
destination.write(buffer, 0, data);
}
else
{
return;
}
}
catch (Exception ex)
{
string msg = ex.Message;
}
}
}
The CopyStream() method accepts source and destination streams in the form of InputStream
and OutputStream objects respectively. It then reads the source
stream using read() method. The read() method reads data in chunks of 8000 sbytes (signed
integer) and writes it to the destination stream using write() method of
OutputStream class.
Copying
ZipEntry objects
The J# compression classes do not allow you to add or
remove files from an existing ZIP file. The only way to add or remove
files from an existing ZIP file is to create a new ZIP file with required items and then replace original ZIP
file with this newly created ZIP file. Hence, we
need a helper method that copies ZipEntry objects from one ZIP file into the other.
CopyEntries() is such a method. The CopyEntries() method has two overloads as
shown below:
private static void CopyEntries(ZipFile source,
ZipOutputStream destination)
{
List<ZipEntry> entries = GetZippedItems(source);
foreach (ZipEntry entry in entries)
{
destination.putNextEntry(entry);
InputStream s = source.getInputStream(entry);
CopyStream(s, destination);
destination.closeEntry();
s.close();
}
}
private static void CopyEntries(ZipFile source,
ZipOutputStream destination,string[] entryNames)
{
List<ZipEntry> entries = GetZippedItems(source);
for(int i=0;i<entryNames.Length;i++)
{
foreach (ZipEntry entry in entries)
{
if (entry.getName() == entryNames[i])
{
destination.putNextEntry(entry);
InputStream s = from.getInputStream(entry);
CopyStream(s, destination);
destination.closeEntry();
s.close();
}
}
}
}
The first overload of CopyEntries() method accepts two parameters.
The first parameter is the source ZipFile from which entries are to be copied. The
second parameter is the target ZipOutputStream to which the entries are to
be written.
The second overload of CopyEntries() method is intended to
copy only certain entries and accepts three parameters. The
significance of the first two parameters is the same as before. The third parameter is
an array of entry names that are to be copied to the
destination ZipOutputStream.
Both the overloads of CopyEntries() method
essentially retrieve a List of ZipEntries using GetZippedItems() helper method.
The entries are then transferred to the ZipOutputStream. The
putNextEntry() method of ZipOutputStream class accepts a ZipEntry to be added to
the ZIP file and writes it to the ZIP file. The getInputStream() method of
ZipFile class accepts a ZipEntry and returns an InputStream pointing to that
entry. This stream is used by CopyStream() helper method for reading the data from
that entry. Remember that ZipEntry simply provides metadata about
an entry whereas the stream obtained from getInputStream() method provides the actual content of the
file. Finally, closeEntry() method of ZipOutputStream class is called to finish writing
the entry.
Adding entries to an existing
ZIP file
The AddEntries()
method adds ZipEntry objects to a ZIP file. The AddEntries() method is
shown below:
private static void AddEntries(ZipFile file,string[] newFiles)
{
string fileName = file.getName();
string tempFileName = Path.GetTempFileName();
ZipOutputStream destination = new ZipOutputStream
(new FileOutputStream(tempFileName));
try
{
CopyEntries(file, destination);
if (newFiles != null)
{
foreach (string f in newFiles)
{
ZipEntry z = new ZipEntry(f.Remove
(0,Path.GetPathRoot(f).Length));
z.setMethod(ZipEntry.DEFLATED);
destination.putNextEntry(z);
try
{
FileInputStream s = new FileInputStream(f);
try
{
CopyStream(s, destination);
}
finally
{
s.close();
}
}
finally
{
destination.closeEntry();
}
}
}
}
finally
{
destination.close();
}
file.close();
System.IO.File.Copy(tempFileName, fileName, true);
System.IO.File.Delete(tempFileName);
}
The code retrieves the full path of
the ZipFile by calling its getName() method. It also obtains a temporary
file name using GetTempFileName() method of System.IO class. You might be wondering as
to why we need a temporary file here. The AddEntries() is a helper method
that will be called while creating a new ZIP file as well as while adding
files to existing ZIP file. The J# compression classes do not allow you to
modifying ZIP files directly. Hence, we create a new ZIP file with required items
and then delete the old ZIP file. For this temporary ZIP file we need a temporary
file name and hence we used the GetTempFileName() method. We then create
a new ZipOutputStream object this time pointing to the temporary ZIP file. Then CopyEntries()
helper method is called. The CopyEntries() helper method copies entries
from specified ZIP file (first parameter) to a ZipOutputStream (second
parameter). If you are creating a new ZIP file then CopyEntries() method will
not copy any entries. However, if you are adding files to an existing ZIP file
then it will copy all the entries from existing ZIP file to the new temporary
ZIP file.
Next, a for
loop adds all the files to be zipped to the
ZipFile. Each zipped file is represented by a class called ZipEntry.
The constructor of ZipEntry class accepts the name of the entry. The setMethod() method
sets the compression method to DEFLATED. The other possibility is STORED which packages the file
in un-compressed format. The newly created ZipEntry is added
to the ZipOutputStream using its putNextEntry() method. A ZipEntry merely represents metadata of an
entry. You still need to add actual contents of the
file into the ZIP file. This is done by CopyStream()
helper method.
Removing entries from an existing
ZIP file
As opposite to the AddEntries() method, the RemoveEntries() method
removes ZipEntry objects from a given ZIP file. The RemoveEntries() method is
shown below:
private static void RemoveEntries(ZipFile file, string[] items)
{
string fileName = file.getName();
string tempFileName = Path.GetTempFileName();
ZipOutputStream destination = new ZipOutputStream
(new FileOutputStream(tempFileName));
try
{
List<ZipEntry> allItems = GetZippedItems(file);
List<string> filteredItems = new List<string>();
foreach (ZipEntry entry in allItems)
{
bool found = false;
foreach (string s in items)
{
if (s != entry.getName())
{
found = true;
}
}
if (found)
{
filteredItems.Add(entry.getName());
}
}
CopyEntries(file, destination,filteredItems.ToArray());
}
finally
{
destination.close();
}
file.close();
System.IO.File.Copy(tempFileName, fileName, true);
System.IO.File.Delete(tempFileName);
}
The RemoveEntries() method accepts the ZipFile from
which entries are to be removed and an array of entry names to be removed. The
code of RemoveEntries() method is very similar to AddEntries() method except
that it doesn't copies specified entries. Notice the code mark in bold letters.
The code essentially compares list of all the entries and list of the entries to be removed. The difference between these two lists
is nothing but a list of entries to be
copied. The CopyEntries() method is then called by passing the list of entries to be
copied. Recollect that second overload of CopyEntries() is designed for copying only the
specified entries.
Creating a new
ZIP file
In order to create a new ZIP file we write a static method
named CreateZipFile() inside the ZipFileHelper class. The CreateZipFile() method
accepts two parameters viz. path and name of the ZIP file to be created and
array of file names that are to be zipped. The CreateZipFile() method is
shown below:
public static void CreateZipFile(string filename,
string[] items)
{
FileOutputStream fout = new FileOutputStream(filename);
ZipOutputStream zout = new ZipOutputStream(fout);
zout.close();
ZipFile zipfile = new ZipFile(filename);
AddEntries(zipfile, items);
}
The code creates an instance of FileOutputStream class. The FileOutputStream class
represents a stream capable of writing to a file. The constructor of FileOutputStream class
accepts the path of the file to which we wish
to write. This FileOutputStream instance is then supplied to an instance of ZipOutputStream class. The
ZipOutputStream class represents a writable stream to a ZIP file. The ZipOutputStream is
then closed causing a new empty ZIP file to create. An object of ZipFile class is
then created. The ZipFile class represents a ZIP file in your code
and is used to manipulate contents of the ZIP file. Finally, AddEntries() helper method is called
by passing the ZipFile object and names of the files to
be zipped.
Adding files to an existing ZIP file
There might be situations wherein you may wish to add files to an existing ZIP
file. The AddToZipFile() method does exactly that. The AddToZipFile() method is
shown below:
public static void AddToZipFile(string filename,
string[] items)
{
ZipFile file = new ZipFile(filename);
AddEntries(file, items);
}
The AddToZipFile() method
accepts the path of the ZIP file and array of new files to be added. It then
creates an instance of ZipFile class and calls AddEntries() method we
created earlier.
Removing files from an existing
ZIP file
The RemoveFromZipFile()
method removes specified entries from a ZIP file. The method is
shown below:
public static void RemoveFromZipFile(string filename,
string[] items)
{
ZipFile file = new ZipFile(filename);
RemoveEntries(file, items);
}
The RemoveFromZipFile() method accepts name of the ZIP file from which
items are to be removed and an array of entry names that are to be removed.
It then calls RemoveEntries() method by passing ZipFile and entries to
be removed.
Extracting a
ZIP file
Up till now you learnt to
compress files into a ZIP file and modify existing ZIP files by
adding or removing items from them. Now it's time to learn how to extract ZIP files.
The ExtractZipFile() method is intended for doing this job and is
shown below:
public static void ExtractZipFile(string zipfilename,
string destination)
{
ZipFile zipfile = new ZipFile(zipfilename);
List<ZipEntry> entries = GetZippedItems(zipfile);
foreach (ZipEntry entry in entries)
{
if (!entry.isDirectory())
{
InputStream s = zipfile.getInputStream(entry);
try
{
string fname = System.IO.Path.GetFileName(entry.getName());
string dir=System.IO.Path.GetDirectoryName(entry.getName());
string newpath = destination + @"\" + dir;
System.IO.Directory.CreateDirectory(newpath);
FileOutputStream dest = new FileOutputStream
(System.IO.Path.Combine(newpath, fname));
try
{
CopyStream(s, dest);
}
finally
{
dest.close();
}
}
finally
{
s.close();
}
}
}
}
The ExtractZipFile() method accepts path of a ZIP
file to be extracted and destination folder where the files will be extracted.
It then creates a ZipFile object and obtains
entries within the ZIP file using GetZippedItems() helper method. The for loop iterates
through all the entries. With each iteration the entry is extracted
to the specified folder. The getInputStream() method of ZipFile class returns an InputStream
for that entry. This stream acts as the source stream.
The getName() method of ZipEntry class returns full name of the entry. Note that an
entry name doesn't contain drive information for obvious reasons. Based on this entry
name destination path and name of the file is calculated. Unzipping a file must create the
same directory structure as present during zipping it. This is done by
calling CreateDirectory() method Directory class. A FileOutputStream is then created to write the extracted file onto
the disk. The CopyStream() method transfers data from source InputStream to
destination FileOutputStream.
That's it! This completes our
class library.
Using the
class library
Using the class library is relatively easy. You simply need to call
methods of ZipFileHelper class as per your requirement. For example, to create a
new ZIP file you need to call CreateZipFile() method and to extract a ZIP file you
need to call ExtractZipFile() method. The accompanying source code contains a Windows
based client application that consumes the ZipFileHelper class we just created. Though we will not discuss
the client code in any details here is how the client
looks like:
You can
simply run the client and test if your class library works
as expected.
Summary
C# and VB.NET
do not provide any ready made solution for compressing and
decompressing files programmatically. However, using J# classes you can accomplish this task. The java.utils.zip namespaces
from vjslib.dll provides classes such as ZipFile, ZipEntry and ZipOutputStream that allow you
to work with ZIP files. Our C# class ZipFileHelper encapsulates J# classes so that your client
application need not have any J# specific class references. This way other
developers not knowing J# can also use our class library for compressing and decompressing files. Moreover,
you can use the class library in Windows as well as
web applications.