Archivey User Guide
Archivey is a Python library that provides a consistent interface for reading and extracting files from many archive formats, including ZIP, TAR, RAR, 7z, and compressed formats like .gz, .bz2, .xz, .zst, and .lz4.
This guide covers the most common use cases. For full details, see the API reference.
π¦ Opening an Archive
Use open_archive to open any supported archive:
from archivey import open_archive
with open_archive("data.zip") as archive:
print("Opened archive with", len(archive.get_members()), "entries")
You can pass:
- A file path or binary stream
config: anArchiveyConfigobjectstreaming_only=True: enables one-pass streaming modepwd: password for encrypted archives
π€ Streaming-Safe Methods
Some archive formats (like .tar.gz, .tar.xz) donβt include a central index, so listing or accessing members typically requires decompressing the entire archive. Similarly, solid archives (like some RAR and 7z files) store multiple files in a single compressed block β so accessing a file mid-archive may require decompressing everything before it. Underlying libraries often perform this extra decompression silently.
Streaming-safe methods let you read or extract relevant members in a single pass, avoiding redundant decompression. They also support non-seekable sources (e.g. pipes or network streams), if the underlying format or library allows.
When opening with streaming_only=True, non-streaming methods are disabled to prevent accidental re-decompression. Even outside of streaming mode, these methods may still be more efficient.
extractall
Extracts all or selected members to a target directory:
Options:
members: list of names and/orArchiveMemberobjects, or a predicate function to select entries to extractfilter: sanitization policy or callable to adjust, reject, or rename members- Predefined
ExtractionFiltervalues:DATA,TAR, orFULLY_TRUSTED - Custom:
(member, dest_path) -> member or None
Useful for renaming files, skipping dangerous paths, adjusting permissions, etc. pwd: optional password for encrypted members
If omitted, uses the value passed toopen_archive(if any). You can override it here or use it to handle archives with multiple passwords.
Returns a mapping of extracted paths to their corresponding ArchiveMember.
iter_members_with_streams
Iterates over each member, yielding (ArchiveMember, BinaryIO | None):
for member, stream in archive.iter_members_with_streams():
print(member.filename)
if stream:
data = stream.read()
- Accepts the same
members,filter, andpwdarguments asextractall - Streams are lazily opened and closed automatically as iteration advances
streamisNonefor non-file entries (e.g. directories or symlinks)
get_members_if_available
Returns the member list if itβs already known or can be retrieved from a central directory (e.g. ZIP or 7z). Returns None if the archive would need to be scanned or decompressed.
Useful for progress reporting or early inspection without triggering a full scan.
ποΈ Random-Access Methods
These methods are available only if the archive was not opened in streaming_only mode. You can check with:
get_members
Returns a complete list of archive entries:
Note: For some formats, this may involve scanning or decompressing large portions of the archive.
open
Opens a specific file in the archive:
If the member is a symlink or hardlink, the link will be resolved to its target, and the stream will reflect the targetβs contents. Raises an error if the member is a directory, or a link pointing outside the archive or to a missing file.
extract
Extracts a single member to disk:
Returns the extracted file path.
get_member
Looks up a member by name or validates an existing one:
π§ͺ Filters and Sanitization
Archivey applies sanitization by default to prevent unsafe extraction:
- Strips absolute paths
- Blocks path traversal (../)
- Normalizes symlink targets
- Adjusts unsafe permissions
You can override this with the filter argument or set it globally using extraction_filter:
from archivey import ArchiveyConfig, ExtractionFilter
config = ArchiveyConfig(extraction_filter=ExtractionFilter.FULLY_TRUSTED)
Predefined filters:
DATA: safe defaults (default)TAR: mimicstarbehaviorFULLY_TRUSTED: disables filtering (use with caution)
You can also use a custom function:
βοΈ Configuration Options
You can control Archiveyβs behavior using an ArchiveyConfig object.
Pass it to open_archive, or set it globally using set_archivey_config and get_archivey_config.
from archivey import ArchiveyConfig, OverwriteMode
config = ArchiveyConfig(
use_rapidgzip=True,
overwrite_mode=OverwriteMode.SKIP,
)
set_archivey_config(config)
Common options:
- use_rar_stream: improves streaming performance for solid RAR archives by avoiding repeated decompression; uses unrar directly instead of rarfile
- use_rapidgzip, use_indexed_bzip2, etc.: enable faster or more flexible backends
- overwrite_mode: controls behavior when extracting over existing files
- extraction_filter: global sanitization policy for extracted entries
You can also use the archivey_config context manager to temporarily override the global config:
from archivey import archivey_config, get_archivey_config
with archivey_config(
use_rapidgzip=True, extraction_filter="data", overwrite_mode="skip"
):
print(get_archivey_config())
with open_archive(...):
...
π§΅ Reading Compressed Streams
Use open_compressed_stream to read .gz, .bz2, .xz, .zst, or .lz4 files:
from archivey import open_compressed_stream
with open_compressed_stream("file.txt.gz") as f:
print(f.read().decode())
π Error Handling
All archive-related exceptions derive from ArchiveError.
Notable subtypes:
- ArchiveEncryptedError
- ArchiveCorruptedError
- ArchiveMemberNotFoundError
Example:
from archivey import open_archive, ArchiveError
try:
with open_archive("file.7z") as archive:
...
except ArchiveError as e:
print("Archive error:", e)