Skip to content

Archivey API reference

archivey

open_archive(path_or_stream, *, config=None, streaming_only=False, pwd=None, format=None)

Open an archive file and return an ArchiveReader instance.

Parameters:

Name Type Description Default
path_or_stream str | bytes | PathLike | ReadableBinaryStream

Path to the archive file (e.g., "my_archive.zip", "data.tar.gz") or a binary file-like object containing the archive data.

required
config ArchiveyConfig | None

Optional ArchiveyConfig object to customize behavior. If None, the default configuration (which may have been customized with set_archivey_config) is used.

None
streaming_only bool

If True, forces the archive to be opened in a streaming-only mode, even if it supports random access. This can be more efficient if you only need to extract the archive or iterate over its members once.

If set to True, disables random access methods like open() and extract() to avoid expensive seeks or rewinds. Calls to those methods will raise a ValueError.

False
pwd bytes | str | None

Optional password used to decrypt the archive if it is encrypted.

None
format ArchiveFormat | None

Optional archive format to use. If None, the format is auto-detected.

None

Returns:

Type Description
ArchiveReader

An ArchiveReader instance for working with the archive.

Raises:

Type Description
FileNotFoundError

If path_or_stream points to a non-existent file.

ArchiveNotSupportedError

If the archive format is not supported or cannot be determined.

ArchiveCorruptedError

If the archive is detected as corrupted during opening.

ArchiveEncryptedError

If the archive is encrypted and no password is provided, or if the provided password is incorrect. This will only be raised here if the archive header is encrypted; otherwise, the incorrect password may only be detected when attempting to read an encrypted member.

TypeError

If path_or_stream or pwd have an invalid type.

Example
from archivey import open_archive, ArchiveError

try:
    with open_archive("my_data.zip", pwd="secret") as archive:
        print(f"Members: {archive.get_members()}")
        # Further operations with the archive
except FileNotFoundError:
    print("Error: Archive file not found.")
except ArchiveError as e:
    print(f"An archive error occurred: {e}")

open_compressed_stream(path_or_stream, *, config=None, format=None)

Open a single-file compressed stream and return the uncompressed stream.

This function ensures that if a stream is passed, reading starts from the stream's current position at the time of the call, after any internal operations like format detection (which might require reading from the beginning of the stream).

Parameters:

Name Type Description Default
path_or_stream BinaryIO | str | bytes | PathLike

Path to the compressed file (e.g., "my_data.gz", "data.bz2") or a binary file-like object containing the compressed data.

required
config ArchiveyConfig | None

Optional ArchiveyConfig object to customize behavior. If None, the default configuration (which may have been customized with set_archivey_config) is used.

None
format ArchiveFormat | None

Optional archive format to use. If None, the format is auto-detected.

None

Returns:

Type Description
BinaryIO

A binary file-like object containing the uncompressed data.

Raises:

Type Description
FileNotFoundError

If path_or_stream points to a non-existent file.

ArchiveNotSupportedError

If the archive format is not supported or cannot be determined.

ArchiveCorruptedError

If the archive is detected as corrupted during opening.

TypeError

If path_or_stream has an invalid type.

ArchiveReader

Bases: ABC

Represents a readable archive, such as a ZIP or TAR file.

Provides a uniform interface for listing, reading, and extracting files from archives, regardless of format. Use open_archive() to obtain an instance of this class.

__init__(archive_path, format)

Initialize the ArchiveReader with a file path or stream and detected format.

Parameters:

Name Type Description Default
archive_path BinaryIO | str | bytes | PathLike

Path or binary stream of the archive.

required
format ArchiveFormat

ArchiveFormat indicating the archive type.

required

Raises:

Type Description
ValueError

If the input is not a supported type.

close() abstractmethod

Close the archive and release any underlying resources.

This method is idempotent (callable multiple times without error). It is automatically called when the reader is used as a context manager.

extract(member_or_filename, path=None, pwd=None) abstractmethod

Extract a single member to a target path.

Parameters:

Name Type Description Default
member_or_filename ArchiveMember | str

The member to extract.

required
path str | PathLike | None

The path to extract to. Defaults to the current working directory.

None
pwd bytes | str | None

Optional password to use for encrypted members, if needed; by default, the password passed when opening the archive is used.

None

Returns:

Type Description
str | None

The path of the extracted file, or None for non-file entries.

Raises:

Type Description
ArchiveMemberNotFoundError

If the member is not found.

ArchiveEncryptedError

If the member is encrypted and pwd is incorrect or not provided.

ArchiveCorruptedError

If the compressed data is corrupted.

ValueError

If the archive was opened in streaming mode.

extractall(path=None, members=None, *, pwd=None, filter=None) abstractmethod

Extract all (or selected) members to a given directory.

If the archive was opened in streaming mode, this method can only be called once.

Parameters:

Name Type Description Default
path str | PathLike | None

Target directory. Defaults to the current working directory if None. The directory will be created if it doesn't exist.

None
members Collection[ArchiveMember | str] | Callable[[ArchiveMember], bool] | None

Optional. A collection of member names or ArchiveMember objects to extract. If None, all members are extracted. Can also be a callable that takes an ArchiveMember and returns True if it should be extracted.

None
pwd bytes | str | None

Optional password to use for encrypted members, if needed; by default, the password passed when opening the archive is used.

None
filter ExtractFilterFunc | ExtractionFilter | None

Optional filter or sanitizer applied to each member. Either a predefined ExtractionFilter policy, or a callable that returns a sanitized member or None to exclude it.

None

Returns:

Type Description
dict[str, ArchiveMember]

A mapping from extracted file paths (including the target directory) to

dict[str, ArchiveMember]

their corresponding ArchiveMember objects.

Raises:

Type Description
ArchiveEncryptedError

If a member is encrypted and pwd is invalid or missing.

ArchiveCorruptedError

If the archive is corrupted.

ArchiveIOError

If other I/O-related issues occur.

SameFileError

If extraction would overwrite a file in the archive itself.

get_archive_info() abstractmethod

Return metadata about the archive as an ArchiveInfo object.

Includes format, solidity, comments, and other archive-level information.

Returns:

Type Description
ArchiveInfo

An ArchiveInfo object.

get_member(member_or_filename) abstractmethod

Return an ArchiveMember for the given name or member.

If a filename (str) is provided, looks up the corresponding member. If an ArchiveMember is provided, it is returned as-is after validating that it belongs to this archive. This is useful when accepting either form in a user-facing API.

Parameters:

Name Type Description Default
member_or_filename ArchiveMember | str

A filename or an existing ArchiveMember.

required

Returns:

Type Description
ArchiveMember

The corresponding ArchiveMember.

Raises:

Type Description
ArchiveMemberNotFoundError

If the name does not match any member.

get_members() abstractmethod

Return a list of all members in the archive.

For some formats (e.g. TAR), this may require reading the entire archive if no central directory is available. Always raises ValueError in streaming mode to avoid misuse.

Returns:

Type Description
List[ArchiveMember]

A list of ArchiveMember objects.

Raises:

Type Description
ArchiveError

If member metadata cannot be read.

ValueError

If the archive was opened in streaming mode.

get_members_if_available() abstractmethod

Return a list of members if available without full archive traversal.

For formats with a central directory (e.g. ZIP), this is typically fast. Returns None if not readily available (e.g. TAR streams).

Returns:

Type Description
List[ArchiveMember] | None

A list of ArchiveMember objects, or None if unavailable.

has_random_access() abstractmethod

Return True if this archive supports random access to its members.

Random access allows methods like open(), get_members(), and extract() to be used freely. This returns False if the archive was opened in streaming mode, in which case only a single pass through iter_members_with_streams() or extractall() is supported. supported.

Random access allows methods like open(), get_members(), and extract() to work reliably. Returns False if the archive was opened from a non-seekable source (e.g. a streamed .tar file), in which case only a single pass through iter_members_with_streams() is allowed.

Returns:

Type Description
bool

True if random access is available; False if in streaming mode.

iter_members_with_streams(members=None, *, pwd=None, filter=None) abstractmethod

Iterate over archive members, yielding each with a readable stream if applicable.

For each member, this yields a tuple (ArchiveMember, stream). The stream is a binary file-like object for regular files, and None for non-file members.

If the archive was opened in streaming mode, this method can only be called once.

Parameters:

Name Type Description Default
members Collection[ArchiveMember | str] | Callable[[ArchiveMember], bool] | None

A collection of ArchiveMember or filenames, or a predicate function that returns True for members to include. If None, all members are included.

None
pwd bytes | str | None

Optional password to use for encrypted members, if needed; by default, the password passed when opening the archive is used.

None
filter IteratorFilterFunc | ExtractionFilter | None

Optional filter or sanitizer applied to each member. Either a predefined ExtractionFilter policy, or a callable that returns a sanitized member or None to exclude it.

None

Yields:

Type Description
ArchiveMember

Tuples of (ArchiveMember, BinaryIO | None), one per selected member.

BinaryIO | None

For file members, the stream allows reading their content. For non-file

tuple[ArchiveMember, BinaryIO | None]

members (e.g. directories or links), the stream is None.

tuple[ArchiveMember, BinaryIO | None]

Streams are lazily opened only if accessed, so skipping unused members

tuple[ArchiveMember, BinaryIO | None]

is efficient. Each stream is automatically closed when iteration advances

tuple[ArchiveMember, BinaryIO | None]

to the next member or when the generator is closed.

Raises:

Type Description
ArchiveEncryptedError

If a member is encrypted and pwd is missing or incorrect. (raised only when attempting to read a returned stream)

ArchiveCorruptedError

If member data is found to be corrupted. (may be raised when retrieving the next item, or when attempting to read a returned stream)

ArchiveIOError

If other I/O-related errors occur.

open(member_or_filename, *, pwd=None) abstractmethod

Open a specific member for reading and return a binary stream.

Accepts either a filename (str) or an ArchiveMember. Filenames are resolved to members automatically. For symlinks, this returns the target file’s content.

Requires random access support (see has_random_access()).

Parameters:

Name Type Description Default
member_or_filename ArchiveMember | str

The member or its filename.

required
pwd bytes | str | None

Optional password to use for encrypted members, if needed. By default, the password passed when opening the archive is used.

None

Returns:

Type Description
BinaryIO

A binary stream for reading the member's content.

Raises:

Type Description
ArchiveMemberNotFoundError

If the member is not found.

ArchiveMemberCannotBeOpenedError

If the member is not a file or a link that points to a file.

ArchiveEncryptedError

If the member is encrypted and pwd is incorrect or not provided.

ArchiveCorruptedError

If the compressed data is corrupted.

ValueError

If the archive was opened in streaming mode.

Resolve a link member to its final non-link target.

If the input is not a link, returns the member itself. For symlinks or hardlinks, follows the chain to the real target. If the link points to a file that is not in the archive, returns None.

Parameters:

Name Type Description Default
member ArchiveMember

The ArchiveMember to resolve.

required

Returns:

Type Description
ArchiveMember | None

The resolved ArchiveMember, or None if resolution fails.

ArchiveInfo dataclass

Metadata about the archive format and container-level properties.

ArchiveMember dataclass

Represents a file within an archive.

Parameters:

Name Type Description Default
mtime Optional[datetime]

(computed property) Returns mtime_with_tz without timezone information, for compatibility.

required
member_id int

(computed property) Unique ID for this member within the archive.

Values are assigned in archive order and can be used to disambiguate identical filenames or preserve ordering.

required
archive_id str

(computed property) Unique ID for the archive this member belongs to.

required
date_time Optional[Tuple[int, int, int, int, int, int]]

(computed property) (year, month, day, hour, minute, second) tuple for zipfile compatibility.

required
is_file bool

(computed property) Convenience property returning True if the member is a regular file.

required
is_dir bool

(computed property) Convenience property returning True if the member represents a directory.

required
is_link bool

(computed property) Convenience property returning True if the member is a symbolic or hard link.

required
is_other bool

(computed property) Convenience property returning True if the member's type is neither file, directory nor link.

required
CRC Optional[int]

(computed property) Alias for crc32 (for zipfile compatibility).

required

replace(**kwargs)

Return a copy of this member with selected fields updated.

Used primarily by extraction filters to modify metadata without mutating the original object.

ArchiveFormat

Bases: StrEnum

Supported archive and compression formats.

MemberType

Bases: StrEnum

ExtractionFilter

Bases: StrEnum

Built-in sanitization policies for archive extraction.

These match Python's built-in tarfile named filters, and can be used to block unsafe paths, strip permissions, or restrict file types.

DATA = 'data' class-attribute instance-attribute

Stricter than 'tar': also blocks special files and unsafe links, and removes executable bits from regular files.

FULLY_TRUSTED = 'fully_trusted' class-attribute instance-attribute

No filtering or restrictions. Use only with fully trusted archives.

TAR = 'tar' class-attribute instance-attribute

Blocks absolute paths and files outside destination; strips setuid/setgid/sticky bits and group/other write permissions.

ArchiveyConfig dataclass

Configuration for :func:archivey.open_archive.

extraction_filter = ExtractionFilter.DATA class-attribute instance-attribute

A filter function that can be used to filter members when iterating over an archive. It can be a function that takes an ArchiveMember and returns a possibly-modified ArchiveMember object, or None to skip the member.

overwrite_mode = OverwriteMode.ERROR class-attribute instance-attribute

What to do with existing files when extracting. OVERWRITE: overwrite existing files. SKIP: skip existing files. ERROR: raise an error if a file already exists, and stop extracting.

tar_check_integrity = True class-attribute instance-attribute

If a tar archive is corrupted in a metadata section, tarfile simply stops reading further and acts as if the file has ended. If set, we perform a check that the tar archive has actually been read fully, and raise an error if it's actually corrupted.

use_indexed_bzip2 = False class-attribute instance-attribute

Alternative library that can be used instead of the builtin bzip2 module to read bzip2 streams. Provides multithreaded decompression and random access support.

use_python_xz = False class-attribute instance-attribute

Alternative library that can be used instead of the builtin xz module to read xz streams. Provides random access support.

use_rapidgzip = False class-attribute instance-attribute

Alternative library that can be used instead of the builtin gzip module to read gzip streams. Provides multithreaded decompression and random access support (i.e. jumping to arbitrary positions in the stream without re-decompressing the entire stream), which is particularly useful for accessing random members in compressed tar files.

use_rar_stream = False class-attribute instance-attribute

If set, use an alternative approach instead of calling rarfile when iterating over RAR archive members. This supports decompressing multiple members in a solid archive by going through the archive only once, instead of once per member.

use_single_file_stored_metadata = False class-attribute instance-attribute

If set, data stored in compressed stream headers is set in the ArchiveMember object for single-file compressed archives, instead of basing it only on the file itself. (filename and modification time for gzip archives only)

use_zstandard = False class-attribute instance-attribute

An alternative to pyzstd. Not as good at error reporting.

archivey_config(config=None, **overrides)

Temporarily use config and/or override fields as the default configuration for :func:open_archive and :func:open_compressed_stream.

Example:

with archivey_config(use_rapidgzip=True):
    archive1 = open_archive("path/to/archive.zip")
    archive2 = open_archive("path/to/archive.zip")
    ...

get_archivey_config()

Return the current default configuration.

set_archivey_config(config)

Set the default configuration for :func:open_archive and :func:open_compressed_stream.

ArchiveError

Bases: Exception

Base exception for all archive-related errors raised by Archivey.

archivey.types

Common types and enums used internally by Archivey.

Most public types are exposed through the archivey module, but advanced or format-specific types can be imported from here as needed.

ArchiveFormat

Bases: StrEnum

Supported archive and compression formats.

Source code in src/archivey/types.py
class ArchiveFormat(StrEnum):
    """Supported archive and compression formats."""

    ZIP = "zip"
    RAR = "rar"
    SEVENZIP = "7z"

    GZIP = "gz"
    BZIP2 = "bz2"
    XZ = "xz"
    ZSTD = "zstd"
    LZ4 = "lz4"
    UNIX_COMPRESS = "Z"

    TAR = "tar"
    TAR_GZ = "tar.gz"
    TAR_BZ2 = "tar.bz2"
    TAR_XZ = "tar.xz"
    TAR_ZSTD = "tar.zstd"
    TAR_LZ4 = "tar.lz4"
    TAR_Z = "tar.Z"

    ISO = "iso"
    FOLDER = "folder"

    UNKNOWN = "unknown"

CreateSystem

Bases: IntEnum

Operating system that created the archive member, if known.

These values match the create_system field from the ZIP specification and the Python zipfile module. Other formats may report compatible values where applicable.

Source code in src/archivey/types.py
class CreateSystem(IntEnum):
    """
    Operating system that created the archive member, if known.

    These values match the `create_system` field from the ZIP specification
    and the Python `zipfile` module. Other formats may report compatible values
    where applicable.
    """

    FAT = 0
    AMIGA = 1
    VMS = 2
    UNIX = 3
    VM_CMS = 4
    ATARI_ST = 5
    OS2_HPFS = 6
    MACINTOSH = 7
    Z_SYSTEM = 8
    CPM = 9
    TOPS20 = 10
    NTFS = 11
    QDOS = 12
    ACORN_RISCOS = 13
    UNKNOWN = 255

ArchiveInfo dataclass

Metadata about the archive format and container-level properties.

Source code in src/archivey/types.py
@dataclass
class ArchiveInfo:
    """Metadata about the archive format and container-level properties."""

    format: ArchiveFormat = field(metadata={"description": "The archive format type"})
    version: Optional[str] = field(
        default=None,
        metadata={
            "description": 'The version of the archive format. Format-dependent (e.g. "4" for RAR4, "5" for RAR5).'
        },
    )
    is_solid: bool = field(
        default=False,
        metadata={
            "description": "Whether the archive is solid, i.e. decompressing a member may require decompressing others before it."
        },
    )
    extra: dict[str, Any] = field(
        # Using a lambda instead of "dict" to avoid a mkdocstrings error
        default_factory=lambda: {},
        metadata={
            "description": "Extra format-specific information about the archive."
        },
    )
    comment: Optional[str] = field(
        default=None,
        metadata={
            "description": "A comment associated with the archive. Supported by some formats."
        },
    )

ArchiveMember dataclass

Represents a file within an archive.

Parameters:

Name Type Description Default
mtime Optional[datetime]

(computed property) Returns mtime_with_tz without timezone information, for compatibility.

required
member_id int

(computed property) Unique ID for this member within the archive.

Values are assigned in archive order and can be used to disambiguate identical filenames or preserve ordering.

required
archive_id str

(computed property) Unique ID for the archive this member belongs to.

required
date_time Optional[Tuple[int, int, int, int, int, int]]

(computed property) (year, month, day, hour, minute, second) tuple for zipfile compatibility.

required
is_file bool

(computed property) Convenience property returning True if the member is a regular file.

required
is_dir bool

(computed property) Convenience property returning True if the member represents a directory.

required
is_link bool

(computed property) Convenience property returning True if the member is a symbolic or hard link.

required
is_other bool

(computed property) Convenience property returning True if the member's type is neither file, directory nor link.

required
CRC Optional[int]

(computed property) Alias for crc32 (for zipfile compatibility).

required
Source code in src/archivey/types.py
@dataclass
class ArchiveMember:
    """Represents a file within an archive."""

    filename: str = field(
        metadata={
            "description": "The name of the member. Directory names always end with a slash."
        }
    )
    file_size: Optional[int] = field(
        metadata={"description": "The size of the member's data in bytes, if known."}
    )
    compress_size: Optional[int] = field(
        metadata={
            "description": "The size of the member's compressed data in bytes, if known."
        }
    )
    mtime_with_tz: Optional[datetime] = field(
        metadata={
            "description": "The modification time of the member. May include a timezone (likely UTC) if the archive format uses global time, or be a naive datetime if the archive format uses local time."
        }
    )
    type: MemberType = field(metadata={"description": "The type of the member."})
    mode: Optional[int] = field(
        default=None, metadata={"description": "Unix permissions of the member."}
    )
    crc32: Optional[int] = field(
        default=None,
        metadata={"description": "The CRC32 checksum of the member's data, if known."},
    )
    compression_method: Optional[str] = field(
        default=None,
        metadata={
            "description": "The compression method used for the member, if known. Format-dependent."
        },
    )
    comment: Optional[str] = field(
        default=None,
        metadata={
            "description": "A comment associated with the member. Supported by some formats."
        },
    )
    create_system: Optional[CreateSystem] = field(
        default=None,
        metadata={
            "description": "The operating system on which the member was created, if known."
        },
    )
    encrypted: bool = field(
        default=False,
        metadata={"description": "Whether the member's data is encrypted, if known."},
    )
    extra: dict[str, Any] = field(
        # Using a lambda instead of "dict" to avoid a mkdocstrings error
        default_factory=lambda: {},
        metadata={"description": "Extra format-specific information about the member."},
    )
    link_target: Optional[str] = field(
        default=None,
        metadata={
            "description": "The target of the link, if the member is a symbolic or hard link. For hard links, this is the path of another file in the archive; for symbolic links, this is the target path relative to the directory containing the link. In some formats, the link target is stored in the member's data, and may not be available when getting the member list, and/or may be encrypted. In those cases, the link target will be filled when iterating through the archive."
        },
    )
    raw_info: Optional[Any] = field(
        default=None,
        metadata={"description": "The raw info object returned by the archive reader."},
    )
    _member_id: Optional[int] = field(
        default=None,
    )

    # A flag indicating whether the member has been modified by a filter.
    _edited_by_filter: bool = field(
        default=False,
    )

    @property
    def mtime(self) -> Optional[datetime]:
        """Returns `mtime_with_tz` without timezone information, for compatibility."""
        if self.mtime_with_tz is None:
            return None
        return self.mtime_with_tz.replace(tzinfo=None)

    @property
    def member_id(self) -> int:
        """Unique ID for this member within the archive.

        Values are assigned in archive order and can be used to
        disambiguate identical filenames or preserve ordering.
        """
        if self._member_id is None:
            raise ValueError("Member index not yet set")
        return self._member_id

    _archive_id: Optional[str] = field(
        default=None,
    )

    @property
    def archive_id(self) -> str:
        """Unique ID for the archive this member belongs to."""
        if self._archive_id is None:
            raise ValueError("Archive ID not yet set")
        return self._archive_id

    # Properties for zipfile compatibility (and others, as much as possible)
    @property
    def date_time(self) -> Optional[Tuple[int, int, int, int, int, int]]:
        """(year, month, day, hour, minute, second) tuple for `zipfile` compatibility."""
        if self.mtime is None:
            return None
        return (
            self.mtime.year,
            self.mtime.month,
            self.mtime.day,
            self.mtime.hour,
            self.mtime.minute,
            self.mtime.second,
        )

    @property
    def is_file(self) -> bool:
        """Convenience property returning ``True`` if the member is a regular file."""
        return self.type == MemberType.FILE

    @property
    def is_dir(self) -> bool:
        """Convenience property returning ``True`` if the member represents a directory."""
        return self.type == MemberType.DIR

    @property
    def is_link(self) -> bool:
        """Convenience property returning ``True`` if the member is a symbolic or hard link."""
        return self.type == MemberType.SYMLINK or self.type == MemberType.HARDLINK

    @property
    def is_other(self) -> bool:
        """Convenience property returning ``True`` if the member's type is neither file, directory nor link."""
        return self.type == MemberType.OTHER

    @property
    def CRC(self) -> Optional[int]:
        """Alias for `crc32` (for `zipfile` compatibility)."""
        return self.crc32

    def replace(self, **kwargs: Any) -> "ArchiveMember":
        """Return a copy of this member with selected fields updated.

        Used primarily by extraction filters to modify metadata without
        mutating the original object.
        """
        replaced = replace(self, **kwargs)
        replaced._edited_by_filter = True
        return replaced

replace(**kwargs)

Return a copy of this member with selected fields updated.

Used primarily by extraction filters to modify metadata without mutating the original object.

Source code in src/archivey/types.py
def replace(self, **kwargs: Any) -> "ArchiveMember":
    """Return a copy of this member with selected fields updated.

    Used primarily by extraction filters to modify metadata without
    mutating the original object.
    """
    replaced = replace(self, **kwargs)
    replaced._edited_by_filter = True
    return replaced

FilterFunc

Bases: Protocol

A callable that takes a member and its destination path, and returns a modified member or None to skip it during extraction or iteration.

Source code in src/archivey/types.py
class FilterFunc(Protocol):
    """A callable that takes a member and its destination path, and returns a modified
    member or `None` to skip it during extraction or iteration."""

    @overload
    def __call__(self, member: ArchiveMember) -> ArchiveMember | None: ...

    @overload
    def __call__(
        self, member: ArchiveMember, dest_path: str
    ) -> ArchiveMember | None: ...

    def __call__(
        self, member: ArchiveMember, dest_path: str | None = None
    ) -> ArchiveMember | None: ...

ExtractionFilter

Bases: StrEnum

Built-in sanitization policies for archive extraction.

These match Python's built-in tarfile named filters, and can be used to block unsafe paths, strip permissions, or restrict file types.

Source code in src/archivey/types.py
class ExtractionFilter(StrEnum):
    """Built-in sanitization policies for archive extraction.

    These match Python's built-in [`tarfile` named filters](https://docs.python.org/3/library/tarfile.html#default-named-filters),
    and can be used to block unsafe paths, strip permissions, or restrict file types.
    """

    FULLY_TRUSTED = "fully_trusted"
    """No filtering or restrictions. Use only with fully trusted archives."""

    TAR = "tar"
    """Blocks absolute paths and files outside destination; strips setuid/setgid/sticky bits and group/other write permissions."""

    DATA = "data"
    """Stricter than 'tar': also blocks special files and unsafe links, and removes executable bits from regular files."""

FULLY_TRUSTED = 'fully_trusted' class-attribute instance-attribute

No filtering or restrictions. Use only with fully trusted archives.

TAR = 'tar' class-attribute instance-attribute

Blocks absolute paths and files outside destination; strips setuid/setgid/sticky bits and group/other write permissions.

DATA = 'data' class-attribute instance-attribute

Stricter than 'tar': also blocks special files and unsafe links, and removes executable bits from regular files.

ReadableBinaryStream

Bases: Protocol

Protocol for a readable binary stream.

Source code in src/archivey/types.py
@runtime_checkable
class ReadableBinaryStream(Protocol):
    """Protocol for a readable binary stream."""

    def read(self, n: int = -1, /) -> bytes: ...

ReadableStreamLikeOrSimilar = ReadableBinaryStream | io.IOBase | IO[bytes] module-attribute

A readable binary stream or similar object (e.g. IO[bytes]).

archivey.exceptions

Custom exceptions raised by Archivey.

The base ArchiveError can be accessed from the archivey module. More specific subtypes are defined here to allow fine-grained error handling when needed.

ArchiveError

Bases: Exception

Base exception for all archive-related errors raised by Archivey.

ArchiveReadError

Bases: ArchiveError

Base class for errors while reading or decoding the archive contents.

ArchiveUnsupportedFeatureError

Bases: ArchiveReadError

Raised when an archive format or feature is not supported.

ArchiveCorruptedError

Bases: ArchiveReadError

Raised when an archive is detected as corrupted, incomplete, or invalid.

ArchiveEOFError

Bases: ArchiveCorruptedError

Raised when an unexpected end-of-file is encountered while reading an archive.

ArchiveStreamNotSeekableError

Bases: ArchiveReadError

Raised when a non-seekable stream is passed to open_archive() or open_compressed_stream(), but the archive format or backend library requires a seekable input stream.

ArchiveMemberError

Bases: ArchiveError

Base class for errors related to archive members.

ArchiveMemberNotFoundError

Bases: ArchiveMemberError

Raised when a requested member is not found within the archive.

ArchiveMemberCannotBeOpenedError

Bases: ArchiveMemberError

Raised when a member cannot be opened for reading, typically because it's a directory, special file, or unresolved link.

ArchiveLinkTargetNotFoundError

Bases: ArchiveMemberError

Raised when a symbolic or hard link within the archive points to a target that cannot be found within the same archive.

ArchiveExtractionError

Bases: ArchiveError

Base class for errors encountered during extraction to the filesystem.

ArchiveFileExistsError

Bases: ArchiveExtractionError

Raised during extraction if a file to be written already exists and the overwrite mode prevents overwriting it.

ArchiveEncryptedError

Bases: ArchiveError

Raised when an archive or member is encrypted and either no password was provided, or the provided password is incorrect.

ArchiveFilterError

Bases: ArchiveError

Raised when a filter rejects a member due to unsafe properties.

ArchiveNotSupportedError

Bases: ArchiveError

Raised when the detected archive format is not supported by Archivey.

PackageNotInstalledError

Bases: ArchiveError

Raised when a required third-party library or package for handling a specific archive format is not installed in the environment.

archivey.filters

Custom filter functions for Archivey.

You don't need to use this package if you just want to use the default filters. Just pass one of the :ref:archivey.ExtractionFilter values to the iter_members_with_streams or extractall methods, or set it in the :ref:archivey.ArchiveyConfig.extraction_filter field.

If you need a filter with custom options, you can use the create_filter function. Or you can create your own filter function by implementing the :ref:archivey.FilterFunc type.

create_filter(*, for_data, sanitize_names, sanitize_link_targets, sanitize_permissions, raise_on_error)

Create a filter function with the given options.

The filter function can be passed to iter_members_with_streams or extractall.

Parameters:

Name Type Description Default
for_data bool

Whether the filter is for data members (files and directories).

required
sanitize_names bool

Whether to sanitize the names of members.

required
sanitize_link_targets bool

Whether to sanitize the link targets of members.

required
sanitize_permissions bool

Whether to sanitize the permissions of members.

required
raise_on_error bool

Whether to raise an error if a filter function returns None.

required