Skip to content

Archivey API reference

archivey

open_archive(path_or_stream, *, config=None, streaming_only=False, pwd=None, format=None)

Open an archive file and return an ArchiveReader instance.

Parameters:

Name Type Description Default
path_or_stream str | bytes | PathLike | ReadableBinaryStream

Path to the archive file (e.g., "my_archive.zip", "data.tar.gz") or a binary file-like object containing the archive data.

required
config ArchiveyConfig | None

Optional ArchiveyConfig object to customize behavior. If None, the default configuration (which may have been customized with set_archivey_config) is used.

None
streaming_only bool

If True, forces the archive to be opened in a streaming-only mode, even if it supports random access. This can be more efficient if you only need to extract the archive or iterate over its members once.

If set to True, disables random access methods like open() and extract() to avoid expensive seeks or rewinds. Calls to those methods will raise a ValueError.

False
pwd bytes | str | None

Optional password used to decrypt the archive if it is encrypted.

None
format ArchiveFormat | ContainerFormat | StreamFormat | None

Optional archive format to use. If None, the format is auto-detected.

None

Returns:

Type Description
ArchiveReader

An ArchiveReader instance for working with the archive.

Raises:

Type Description
FileNotFoundError

If path_or_stream points to a non-existent file.

ArchiveNotSupportedError

If the archive format is not supported or cannot be determined.

ArchiveCorruptedError

If the archive is detected as corrupted during opening.

ArchiveEncryptedError

If the archive is encrypted and no password is provided, or if the provided password is incorrect. This will only be raised here if the archive header is encrypted; otherwise, the incorrect password may only be detected when attempting to read an encrypted member.

TypeError

If path_or_stream or pwd have an invalid type.

Example
from archivey import open_archive, ArchiveError

try:
    with open_archive("my_data.zip", pwd="secret") as archive:
        print(f"Members: {archive.get_members()}")
        # Further operations with the archive
except FileNotFoundError:
    print("Error: Archive file not found.")
except ArchiveError as e:
    print(f"An archive error occurred: {e}")

open_compressed_stream(path_or_stream, *, config=None, format=None)

Open a single-file compressed stream and return the uncompressed stream.

This function ensures that if a stream is passed, reading starts from the stream's current position at the time of the call, after any internal operations like format detection (which might require reading from the beginning of the stream).

Parameters:

Name Type Description Default
path_or_stream BinaryIO | str | bytes | PathLike

Path to the compressed file (e.g., "my_data.gz", "data.bz2") or a binary file-like object containing the compressed data.

required
config ArchiveyConfig | None

Optional ArchiveyConfig object to customize behavior. If None, the default configuration (which may have been customized with set_archivey_config) is used.

None
format ArchiveFormat | StreamFormat | None

Optional archive format to use. If None, the format is auto-detected.

None

Returns:

Type Description
BinaryIO

A binary file-like object containing the uncompressed data.

Raises:

Type Description
FileNotFoundError

If path_or_stream points to a non-existent file.

ArchiveNotSupportedError

If the archive format is not supported or cannot be determined.

ArchiveCorruptedError

If the archive is detected as corrupted during opening.

TypeError

If path_or_stream has an invalid type.

ArchiveReader

Bases: ABC

Represents a readable archive, such as a ZIP or TAR file.

Provides a uniform interface for listing, reading, and extracting files from archives, regardless of format. Use open_archive() to obtain an instance of this class.

__init__(archive_path, format)

Initialize the ArchiveReader with a file path or stream and detected format.

Parameters:

Name Type Description Default
archive_path BinaryIO | str | bytes | PathLike

Path or binary stream of the archive.

required
format ArchiveFormat

ArchiveFormat indicating the archive type.

required

Raises:

Type Description
ValueError

If the input is not a supported type.

close() abstractmethod

Close the archive and release any underlying resources.

This method is idempotent (callable multiple times without error). It is automatically called when the reader is used as a context manager.

extract(member_or_filename, path=None, pwd=None) abstractmethod

Extract a single member to a target path.

Parameters:

Name Type Description Default
member_or_filename ArchiveMember | str

The member to extract.

required
path str | PathLike | None

The path to extract to. Defaults to the current working directory.

None
pwd bytes | str | None

Optional password to use for encrypted members, if needed; by default, the password passed when opening the archive is used.

None

Returns:

Type Description
str | None

The path of the extracted file, or None for non-file entries.

Raises:

Type Description
ArchiveMemberNotFoundError

If the member is not found.

ArchiveEncryptedError

If the member is encrypted and pwd is incorrect or not provided.

ArchiveCorruptedError

If the compressed data is corrupted.

ValueError

If the archive was opened in streaming mode.

extractall(path=None, members=None, *, pwd=None, filter=None) abstractmethod

Extract all (or selected) members to a given directory.

If the archive was opened in streaming mode, this method can only be called once.

Parameters:

Name Type Description Default
path str | PathLike | None

Target directory. Defaults to the current working directory if None. The directory will be created if it doesn't exist.

None
members Collection[ArchiveMember | str] | Callable[[ArchiveMember], bool] | None

Optional. A collection of member names or ArchiveMember objects to extract. If None, all members are extracted. Can also be a callable that takes an ArchiveMember and returns True if it should be extracted.

None
pwd bytes | str | None

Optional password to use for encrypted members, if needed; by default, the password passed when opening the archive is used.

None
filter ExtractFilterFunc | ExtractionFilter | None

Optional filter or sanitizer applied to each member. Either a predefined ExtractionFilter policy, or a callable that returns a sanitized member or None to exclude it.

None

Returns:

Type Description
dict[str, ArchiveMember]

A mapping from extracted file paths (including the target directory) to

dict[str, ArchiveMember]

their corresponding ArchiveMember objects.

Raises:

Type Description
ArchiveEncryptedError

If a member is encrypted and pwd is invalid or missing.

ArchiveCorruptedError

If the archive is corrupted.

ArchiveIOError

If other I/O-related issues occur.

SameFileError

If extraction would overwrite a file in the archive itself.

get_archive_info() abstractmethod

Return metadata about the archive as an ArchiveInfo object.

Includes format, solidity, comments, and other archive-level information.

Returns:

Type Description
ArchiveInfo

An ArchiveInfo object.

get_member(member_or_filename) abstractmethod

Return an ArchiveMember for the given name or member.

If a filename (str) is provided, looks up the corresponding member. If an ArchiveMember is provided, it is returned as-is after validating that it belongs to this archive. This is useful when accepting either form in a user-facing API.

Parameters:

Name Type Description Default
member_or_filename ArchiveMember | str

A filename or an existing ArchiveMember.

required

Returns:

Type Description
ArchiveMember

The corresponding ArchiveMember.

Raises:

Type Description
ArchiveMemberNotFoundError

If the name does not match any member.

get_members() abstractmethod

Return a list of all members in the archive.

For some formats (e.g. TAR), this may require reading the entire archive if no central directory is available. Always raises ValueError in streaming mode to avoid misuse.

Returns:

Type Description
List[ArchiveMember]

A list of ArchiveMember objects.

Raises:

Type Description
ArchiveError

If member metadata cannot be read.

ValueError

If the archive was opened in streaming mode.

get_members_if_available() abstractmethod

Return a list of members if available without full archive traversal.

For formats with a central directory (e.g. ZIP), this is typically fast. Returns None if not readily available (e.g. TAR streams).

Returns:

Type Description
List[ArchiveMember] | None

A list of ArchiveMember objects, or None if unavailable.

has_random_access() abstractmethod

Return True if this archive supports random access to its members.

Random access allows methods like open(), get_members(), and extract() to be used freely. This returns False if the archive was opened in streaming mode, in which case only a single pass through iter_members_with_streams() or extractall() is supported. supported.

Random access allows methods like open(), get_members(), and extract() to work reliably. Returns False if the archive was opened from a non-seekable source (e.g. a streamed .tar file), in which case only a single pass through iter_members_with_streams() is allowed.

Returns:

Type Description
bool

True if random access is available; False if in streaming mode.

iter_members_with_streams(members=None, *, pwd=None, filter=None) abstractmethod

Iterate over archive members, yielding each with a readable stream if applicable.

For each member, this yields a tuple (ArchiveMember, stream). The stream is a binary file-like object for regular files, and None for non-file members.

If the archive was opened in streaming mode, this method can only be called once.

Parameters:

Name Type Description Default
members Collection[ArchiveMember | str] | Callable[[ArchiveMember], bool] | None

A collection of ArchiveMember or filenames, or a predicate function that returns True for members to include. If None, all members are included.

None
pwd bytes | str | None

Optional password to use for encrypted members, if needed; by default, the password passed when opening the archive is used.

None
filter IteratorFilterFunc | ExtractionFilter | None

Optional filter or sanitizer applied to each member. Either a predefined ExtractionFilter policy, or a callable that returns a sanitized member or None to exclude it.

None

Yields:

Type Description
ArchiveMember

Tuples of (ArchiveMember, BinaryIO | None), one per selected member.

BinaryIO | None

For file members, the stream allows reading their content. For non-file

tuple[ArchiveMember, BinaryIO | None]

members (e.g. directories or links), the stream is None.

tuple[ArchiveMember, BinaryIO | None]

Streams are lazily opened only if accessed, so skipping unused members

tuple[ArchiveMember, BinaryIO | None]

is efficient. Each stream is automatically closed when iteration advances

tuple[ArchiveMember, BinaryIO | None]

to the next member or when the generator is closed.

Raises:

Type Description
ArchiveEncryptedError

If a member is encrypted and pwd is missing or incorrect. (raised only when attempting to read a returned stream)

ArchiveCorruptedError

If member data is found to be corrupted. (may be raised when retrieving the next item, or when attempting to read a returned stream)

ArchiveIOError

If other I/O-related errors occur.

open(member_or_filename, *, pwd=None) abstractmethod

Open a specific member for reading and return a binary stream.

Accepts either a filename (str) or an ArchiveMember. Filenames are resolved to members automatically. For symlinks, this returns the target file’s content.

Requires random access support (see has_random_access()).

Parameters:

Name Type Description Default
member_or_filename ArchiveMember | str

The member or its filename.

required
pwd bytes | str | None

Optional password to use for encrypted members, if needed. By default, the password passed when opening the archive is used.

None

Returns:

Type Description
BinaryIO

A binary stream for reading the member's content.

Raises:

Type Description
ArchiveMemberNotFoundError

If the member is not found.

ArchiveMemberCannotBeOpenedError

If the member is not a file or a link that points to a file.

ArchiveEncryptedError

If the member is encrypted and pwd is incorrect or not provided.

ArchiveCorruptedError

If the compressed data is corrupted.

ValueError

If the archive was opened in streaming mode.

Resolve a link member to its final non-link target.

If the input is not a link, returns the member itself. For symlinks or hardlinks, follows the chain to the real target. If the link points to a file that is not in the archive, returns None.

Parameters:

Name Type Description Default
member ArchiveMember

The ArchiveMember to resolve.

required

Returns:

Type Description
ArchiveMember | None

The resolved ArchiveMember, or None if resolution fails.

ArchiveInfo dataclass

Metadata about the archive format and container-level properties.

ArchiveMember dataclass

Represents a file within an archive.

Parameters:

Name Type Description Default
mtime Optional[datetime]

(computed property) Returns mtime_with_tz without timezone information, for compatibility.

required
member_id int

(computed property) Unique ID for this member within the archive.

Values are assigned in archive order and can be used to disambiguate identical filenames or preserve ordering.

required
archive_id str

(computed property) Unique ID for the archive this member belongs to.

required
date_time Optional[Tuple[int, int, int, int, int, int]]

(computed property) (year, month, day, hour, minute, second) tuple for zipfile compatibility.

required
is_file bool

(computed property) Convenience property returning True if the member is a regular file.

required
is_dir bool

(computed property) Convenience property returning True if the member represents a directory.

required
is_link bool

(computed property) Convenience property returning True if the member is a symbolic or hard link.

required
is_other bool

(computed property) Convenience property returning True if the member's type is neither file, directory nor link.

required
CRC Optional[int]

(computed property) Alias for crc32 (for zipfile compatibility).

required

replace(**kwargs)

Return a copy of this member with selected fields updated.

Used primarily by extraction filters to modify metadata without mutating the original object.

ArchiveFormat dataclass

Supported archive and compression formats.

Members:

NameValueDescription
ZIP None
RAR None
SEVENZIP None
GZIP None
BZIP2 None
XZ None
ZSTD None
LZ4 None
LZIP None
ZLIB None
BROTLI None
UNIX_COMPRESS None
TAR None
TAR_GZ None
TAR_BZ2 None
TAR_XZ None
TAR_ZSTD None
TAR_LZ4 None
TAR_Z None
ISO None
FOLDER None
UNKNOWN None

file_extension()

Return the file extension for the archive format.

ContainerFormat

Bases: StrEnum

Supported container formats.

Members:

NameValueDescription
ZIP 'zip'
RAR 'rar'
SEVENZIP '7z'
TAR 'tar'
ISO 'iso'
FOLDER 'folder'
RAW_STREAM 'raw_stream'
UNKNOWN 'unknown'

StreamFormat

Bases: StrEnum

Supported stream formats.

Members:

NameValueDescription
GZIP 'gz'
BZIP2 'bz2'
XZ 'xz'
ZSTD 'zstd'
LZ4 'lz4'
LZIP 'lz'
ZLIB 'zz'
BROTLI 'br'
UNIX_COMPRESS 'Z'
UNCOMPRESSED 'uncompressed'

MemberType

Bases: StrEnum

Possible types of archive members.

Members:

NameValueDescription
FILE 'file'

A regular file.

DIR 'dir'

A directory.

SYMLINK 'symlink'

A symbolic link.

HARDLINK 'hardlink'

A hard link.

OTHER 'other'

An other type of member.

ExtractionFilter

Bases: StrEnum

Built-in sanitization policies for archive extraction.

These match Python's built-in tarfile named filters, and can be used to block unsafe paths, strip permissions, or restrict file types.

Members:

NameValueDescription
FULLY_TRUSTED 'fully_trusted'

No filtering or restrictions. Use only with fully trusted archives.

TAR 'tar'

Blocks absolute paths and files outside destination; strips setuid/setgid/sticky bits and group/other write permissions.

DATA 'data'

Stricter than 'tar': also blocks special files and unsafe links, and removes executable bits from regular files.

ArchiveyConfig dataclass

Configuration for :func:archivey.open_archive.

extraction_filter = ExtractionFilter.DATA class-attribute instance-attribute

A filter function that can be used to filter members when iterating over an archive. It can be a function that takes an ArchiveMember and returns a possibly-modified ArchiveMember object, or None to skip the member.

overwrite_mode = OverwriteMode.ERROR class-attribute instance-attribute

What to do with existing files when extracting. OVERWRITE: overwrite existing files. SKIP: skip existing files. ERROR: raise an error if a file already exists, and stop extracting.

tar_check_integrity = True class-attribute instance-attribute

If a tar archive is corrupted in a metadata section, tarfile simply stops reading further and acts as if the file has ended. If set, we perform a check that the tar archive has actually been read fully, and raise an error if it's actually corrupted.

use_indexed_bzip2 = False class-attribute instance-attribute

Alternative library that can be used instead of the builtin bzip2 module to read bzip2 streams. Provides multithreaded decompression and random access support.

use_python_xz = False class-attribute instance-attribute

Alternative library that can be used instead of the builtin xz module to read xz streams. Provides random access support.

use_rapidgzip = False class-attribute instance-attribute

Alternative library that can be used instead of the builtin gzip module to read gzip streams. Provides multithreaded decompression and random access support (i.e. jumping to arbitrary positions in the stream without re-decompressing the entire stream), which is particularly useful for accessing random members in compressed tar files.

use_rar_stream = False class-attribute instance-attribute

If set, use an alternative approach instead of calling rarfile when iterating over RAR archive members. This supports decompressing multiple members in a solid archive by going through the archive only once, instead of once per member.

use_single_file_stored_metadata = False class-attribute instance-attribute

If set, data stored in compressed stream headers is set in the ArchiveMember object for single-file compressed archives, instead of basing it only on the file itself. (filename and modification time for gzip archives only)

use_zstandard = False class-attribute instance-attribute

An alternative to pyzstd. Not as good at error reporting.

archivey_config(config=None, **overrides)

Temporarily use config and/or override fields as the default configuration for :func:open_archive and :func:open_compressed_stream.

Example:

with archivey_config(use_rapidgzip=True):
    archive1 = open_archive("path/to/archive.zip")
    archive2 = open_archive("path/to/archive.zip")
    ...

get_archivey_config()

Return the current default configuration.

set_archivey_config(config)

Set the default configuration for :func:open_archive and :func:open_compressed_stream.

ArchiveError

Bases: Exception

Base exception for all archive-related errors raised by Archivey.

archivey.types

Common types and enums used internally by Archivey.

Most public types are exposed through the archivey module, but advanced or format-specific types can be imported from here as needed.

ContainerFormat

Bases: StrEnum

Supported container formats.

Members:

NameValueDescription
ZIP 'zip'
RAR 'rar'
SEVENZIP '7z'
TAR 'tar'
ISO 'iso'
FOLDER 'folder'
RAW_STREAM 'raw_stream'
UNKNOWN 'unknown'

StreamFormat

Bases: StrEnum

Supported stream formats.

Members:

NameValueDescription
GZIP 'gz'
BZIP2 'bz2'
XZ 'xz'
ZSTD 'zstd'
LZ4 'lz4'
LZIP 'lz'
ZLIB 'zz'
BROTLI 'br'
UNIX_COMPRESS 'Z'
UNCOMPRESSED 'uncompressed'

ArchiveFormat dataclass

Supported archive and compression formats.

Members:

NameValueDescription
ZIP None
RAR None
SEVENZIP None
GZIP None
BZIP2 None
XZ None
ZSTD None
LZ4 None
LZIP None
ZLIB None
BROTLI None
UNIX_COMPRESS None
TAR None
TAR_GZ None
TAR_BZ2 None
TAR_XZ None
TAR_ZSTD None
TAR_LZ4 None
TAR_Z None
ISO None
FOLDER None
UNKNOWN None

file_extension()

Return the file extension for the archive format.

MemberType

Bases: StrEnum

Possible types of archive members.

Members:

NameValueDescription
FILE 'file'

A regular file.

DIR 'dir'

A directory.

SYMLINK 'symlink'

A symbolic link.

HARDLINK 'hardlink'

A hard link.

OTHER 'other'

An other type of member.

CreateSystem

Bases: IntEnum

Operating system that created the archive member, if known.

These values match the create_system field from the ZIP specification and the Python zipfile module. Other formats may report compatible values where applicable.

Members:

NameValueDescription
FAT 0
AMIGA 1
VMS 2
UNIX 3
VM_CMS 4
ATARI_ST 5
OS2_HPFS 6
MACINTOSH 7
Z_SYSTEM 8
CPM 9
TOPS20 10
NTFS 11
QDOS 12
ACORN_RISCOS 13
UNKNOWN 255

ArchiveInfo dataclass

Metadata about the archive format and container-level properties.

ArchiveMember dataclass

Represents a file within an archive.

Parameters:

Name Type Description Default
mtime Optional[datetime]

(computed property) Returns mtime_with_tz without timezone information, for compatibility.

required
member_id int

(computed property) Unique ID for this member within the archive.

Values are assigned in archive order and can be used to disambiguate identical filenames or preserve ordering.

required
archive_id str

(computed property) Unique ID for the archive this member belongs to.

required
date_time Optional[Tuple[int, int, int, int, int, int]]

(computed property) (year, month, day, hour, minute, second) tuple for zipfile compatibility.

required
is_file bool

(computed property) Convenience property returning True if the member is a regular file.

required
is_dir bool

(computed property) Convenience property returning True if the member represents a directory.

required
is_link bool

(computed property) Convenience property returning True if the member is a symbolic or hard link.

required
is_other bool

(computed property) Convenience property returning True if the member's type is neither file, directory nor link.

required
CRC Optional[int]

(computed property) Alias for crc32 (for zipfile compatibility).

required

replace(**kwargs)

Return a copy of this member with selected fields updated.

Used primarily by extraction filters to modify metadata without mutating the original object.

FilterFunc

Bases: Protocol

A callable that takes a member and its destination path, and returns a modified member or None to skip it during extraction or iteration.

ExtractionFilter

Bases: StrEnum

Built-in sanitization policies for archive extraction.

These match Python's built-in tarfile named filters, and can be used to block unsafe paths, strip permissions, or restrict file types.

Members:

NameValueDescription
FULLY_TRUSTED 'fully_trusted'

No filtering or restrictions. Use only with fully trusted archives.

TAR 'tar'

Blocks absolute paths and files outside destination; strips setuid/setgid/sticky bits and group/other write permissions.

DATA 'data'

Stricter than 'tar': also blocks special files and unsafe links, and removes executable bits from regular files.

ReadableBinaryStream

Bases: Protocol

Protocol for a readable binary stream.

ReadableStreamLikeOrSimilar = ReadableBinaryStream | io.IOBase | IO[bytes] module-attribute

A readable binary stream or similar object (e.g. IO[bytes]).

archivey.exceptions

Custom exceptions raised by Archivey.

The base ArchiveError can be accessed from the archivey module. More specific subtypes are defined here to allow fine-grained error handling when needed.

ArchiveError

Bases: Exception

Base exception for all archive-related errors raised by Archivey.

ArchiveReadError

Bases: ArchiveError

Base class for errors while reading or decoding the archive contents.

ArchiveUnsupportedFeatureError

Bases: ArchiveReadError

Raised when an archive format or feature is not supported.

ArchiveCorruptedError

Bases: ArchiveReadError

Raised when an archive is detected as corrupted, incomplete, or invalid.

ArchiveEOFError

Bases: ArchiveCorruptedError

Raised when an unexpected end-of-file is encountered while reading an archive.

ArchiveStreamNotSeekableError

Bases: ArchiveReadError

Raised when a non-seekable stream is passed to open_archive() or open_compressed_stream(), but the archive format or backend library requires a seekable input stream.

ArchiveMemberError

Bases: ArchiveError

Base class for errors related to archive members.

ArchiveMemberNotFoundError

Bases: ArchiveMemberError

Raised when a requested member is not found within the archive.

ArchiveMemberCannotBeOpenedError

Bases: ArchiveMemberError

Raised when a member cannot be opened for reading, typically because it's a directory, special file, or unresolved link.

ArchiveLinkTargetNotFoundError

Bases: ArchiveMemberError

Raised when a symbolic or hard link within the archive points to a target that cannot be found within the same archive.

ArchiveExtractionError

Bases: ArchiveError

Base class for errors encountered during extraction to the filesystem.

ArchiveFileExistsError

Bases: ArchiveExtractionError

Raised during extraction if a file to be written already exists and the overwrite mode prevents overwriting it.

ArchiveEncryptedError

Bases: ArchiveError

Raised when an archive or member is encrypted and either no password was provided, or the provided password is incorrect.

ArchiveFilterError

Bases: ArchiveError

Raised when a filter rejects a member due to unsafe properties.

ArchiveNotSupportedError

Bases: ArchiveError

Raised when the detected archive format is not supported by Archivey.

PackageNotInstalledError

Bases: ArchiveError

Raised when a required third-party library or package for handling a specific archive format is not installed in the environment.

archivey.filters

Custom filter functions for Archivey.

You don't need to use this package if you just want to use the default filters. Just pass one of the :ref:archivey.ExtractionFilter values to the iter_members_with_streams or extractall methods, or set it in the :ref:archivey.ArchiveyConfig.extraction_filter field.

If you need a filter with custom options, you can use the create_filter function. Or you can create your own filter function by implementing the :ref:archivey.FilterFunc type.

create_filter(*, for_data, sanitize_names, sanitize_link_targets, sanitize_permissions, raise_on_error)

Create a filter function with the given options.

The filter function can be passed to iter_members_with_streams or extractall.

Parameters:

Name Type Description Default
for_data bool

Whether the filter is for data members (files and directories).

required
sanitize_names bool

Whether to sanitize the names of members.

required
sanitize_link_targets bool

Whether to sanitize the link targets of members.

required
sanitize_permissions bool

Whether to sanitize the permissions of members.

required
raise_on_error bool

Whether to raise an error if a filter function returns None.

required