Archivey
Archivey is a library for reading many common archive formats through a simple, consistent interface. It uses several builtin modules and optional external packages for handling different formats, and also adds some features that are missing from them.
Features
- Automatic file format detection
- Support for ZIP, TAR (including
.tar.gz
,.tar.bz2
, etc.), RAR and 7z archives - Support for single-file compressed formats like gzip, bzip2, xz, zstd and lz4
- Consistent handling of symlinks, file times, permissions, and passwords
- Consistent exception hierarchy
- Optimized for sequential iteration over archive members
Installation
Install with pip with all external libraries:
If you'd rather manage dependencies yourself, install only the extras you need. RAR support requires the unrar
tool, which you may need to install separately.
Supported formats and required packages
Format | Extension | Builtin module | Python package | System requirement |
---|---|---|---|---|
ZIP archives | .zip |
zipfile |
||
TAR archives | .tar , .tar.* |
tarfile |
||
RAR archives | .rar |
rarfile cryptography (for encrypted headers) |
unrar binary |
|
7z archives | .7z |
py7zr |
||
Gzip | .gz |
gzip |
rapidgzip (multithreaded decompression and random access) |
|
Bzip2 | .bz2 |
bz2 |
indexed_bzip2 (multithreaded decompression and random access) |
|
XZ | .xz |
lzma |
python-xz (random access) |
|
Zstandard | .zst |
pyzstd (preferred) or zstandard |
||
LZ4 | .lz4 |
lz4 |
||
Unix compress | .Z |
uncompresspy |
Usage
These are the basic features of the library. For more details, see the User guide and API reference.
Extracting files
from archivey import open_archive
with open_archive("example.zip") as archive:
archive.extractall(path="/tmp/destpath", filter="data")
You can use filters when extracting to avoid security issues, similarly to tarfile.
Random access
from archivey import open_archive
with open_archive("example.zip") as archive:
members = archive.get_members()
# Read the contents of the last file in the archive
member_to_read = members[-1]
if member_to_read.is_file:
stream = archive.open(member_to_read)
data = stream.read()
print(member_to_read.filename, data[:20])
open_archive
can open standalone compressed files (e.g. example.gz
) as well. They are handled as archives containing a single member.
Streaming access
Some libraries may decompress parts of the archive multiple times if you list the members in advance or access files individually with archive.open()
. If you don't need to open arbitrary files, and just need to perform an operation on all (or some) files of an archive, iterating through the files avoids extra re-reads and decompressions:
from archivey import open_archive
with open_archive("example.tar.gz", streaming_only=True) as archive:
for member, stream in [archive.iter_members_with_streams()]:
data = stream and stream.read(20)
print(member.filename, member.file_size, data)
streaming_only
is an optional argument; if set, it disallows some methods to ensure your code doesn't accidentally perform expensive operations. (more details)
Single-file compressed streams
Open a compressed file (e.g., .gz
or .xz
) to work with the uncompressed stream:
from archivey import open_compressed_stream
with open_compressed_stream("example.txt.gz") as f:
data = f.read()
Configuration
You can enable optional features and libraries by passing an ArchiveyConfig
to open_archive
and open_compressed_stream
.
from archivey import (
open_archive,
ArchiveyConfig,
ExtractionFilter,
OverwriteMode,
)
config = ArchiveyConfig(
use_rar_stream=True,
use_rapidgzip=True,
overwrite_mode=OverwriteMode.SKIP,
extraction_filter=ExtractionFilter.TAR,
)
with open_archive("file.rar", config=config) as archive:
archive.extractall("out_dir")
Command line usage
Archivey contains a small command line tool simply called archivey
. If not installed by the package manager, you can also invoke it via python -m archivey
.
The CLI is primarily meant for testing and exploring the library, but can be used for basic archive listing and extraction.
You can filter member names using shell patterns placed after --
:
Documentation
For more detailed information on using and extending archivey
, please refer to the following resources:
- User Guide: how to use this library to open and interact with archives, configuration options and so on
- Developer Guide: if you'd like to add support for new archive formats or libraries
- API Reference: detailed documentation of all public classes, methods, and functions
Future plans
Some things on my radar for future versions. Feel free to pick some to contribute!
- Archive format support: ar archives (
.ar
,.deb
), ISO images (.iso
) - Compression format support: Brotli
- Add libarchive as a backend, see what it allows us to do
- Opening self-extracting (SFX) RAR and 7z archives
- Non-seeking access to ZIP archives (similar approach to
stream-unzip
) - Use builtin Zstandard in Python 3.14
- Auto-select libraries or implementations to use based on what is installed and/or required features
- Archive writing support
- Bug: ZIP filename decoding can be wrong in some cases (see sample archive
tests/test_archives_external/encoding_infozip_jules.zip
) - Test under Windows / Mac (there are CI tests, but with failures)
- There should be archives generated in Mac / Windows in test_archives
- Possibly: use oschmod for setting permissions properly under Windows
- Add additional metadata fields (uid, gid, Windows permissions (read-only) in 7z files)
- Add Pathlib-compatible wrapper that allows accessing files inside archives
- Try to read / extract all the test archives in unittests for underlying libraries, and old/weird files, to find bugs