Archivey API reference
archivey
open_archive(path_or_stream, *, config=None, streaming_only=False, pwd=None, format=None)
Open an archive file and return an ArchiveReader instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path_or_stream
|
str | bytes | PathLike | ReadableBinaryStream
|
Path to the archive file (e.g., "my_archive.zip", "data.tar.gz") or a binary file-like object containing the archive data. |
required |
config
|
ArchiveyConfig | None
|
Optional ArchiveyConfig object to customize
behavior. If |
None
|
streaming_only
|
bool
|
If If set to |
False
|
pwd
|
bytes | str | None
|
Optional password used to decrypt the archive if it is encrypted. |
None
|
format
|
ArchiveFormat | None
|
Optional archive format to use. If |
None
|
Returns:
Type | Description |
---|---|
ArchiveReader
|
An ArchiveReader instance for working with the archive. |
Raises:
Type | Description |
---|---|
FileNotFoundError
|
If |
ArchiveNotSupportedError
|
If the archive format is not supported or cannot be determined. |
ArchiveCorruptedError
|
If the archive is detected as corrupted during opening. |
ArchiveEncryptedError
|
If the archive is encrypted and no password is provided, or if the provided password is incorrect. This will only be raised here if the archive header is encrypted; otherwise, the incorrect password may only be detected when attempting to read an encrypted member. |
TypeError
|
If |
Example
from archivey import open_archive, ArchiveError
try:
with open_archive("my_data.zip", pwd="secret") as archive:
print(f"Members: {archive.get_members()}")
# Further operations with the archive
except FileNotFoundError:
print("Error: Archive file not found.")
except ArchiveError as e:
print(f"An archive error occurred: {e}")
open_compressed_stream(path_or_stream, *, config=None, format=None)
Open a single-file compressed stream and return the uncompressed stream.
This function ensures that if a stream is passed, reading starts from the stream's current position at the time of the call, after any internal operations like format detection (which might require reading from the beginning of the stream).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path_or_stream
|
BinaryIO | str | bytes | PathLike
|
Path to the compressed file (e.g., "my_data.gz", "data.bz2") or a binary file-like object containing the compressed data. |
required |
config
|
ArchiveyConfig | None
|
Optional ArchiveyConfig object to customize
behavior. If |
None
|
format
|
ArchiveFormat | None
|
Optional archive format to use. If |
None
|
Returns:
Type | Description |
---|---|
BinaryIO
|
A binary file-like object containing the uncompressed data. |
Raises:
Type | Description |
---|---|
FileNotFoundError
|
If |
ArchiveNotSupportedError
|
If the archive format is not supported or cannot be determined. |
ArchiveCorruptedError
|
If the archive is detected as corrupted during opening. |
TypeError
|
If |
ArchiveReader
Bases: ABC
Represents a readable archive, such as a ZIP or TAR file.
Provides a uniform interface for listing, reading, and extracting files from archives, regardless of format. Use open_archive() to obtain an instance of this class.
__init__(archive_path, format)
Initialize the ArchiveReader with a file path or stream and detected format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
archive_path
|
BinaryIO | str | bytes | PathLike
|
Path or binary stream of the archive. |
required |
format
|
ArchiveFormat
|
ArchiveFormat indicating the archive type. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If the input is not a supported type. |
close()
abstractmethod
Close the archive and release any underlying resources.
This method is idempotent (callable multiple times without error). It is automatically called when the reader is used as a context manager.
extract(member_or_filename, path=None, pwd=None)
abstractmethod
Extract a single member to a target path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
member_or_filename
|
ArchiveMember | str
|
The member to extract. |
required |
path
|
str | PathLike | None
|
The path to extract to. Defaults to the current working directory. |
None
|
pwd
|
bytes | str | None
|
Optional password to use for encrypted members, if needed; by default, the password passed when opening the archive is used. |
None
|
Returns:
Type | Description |
---|---|
str | None
|
The path of the extracted file, or None for non-file entries. |
Raises:
Type | Description |
---|---|
ArchiveMemberNotFoundError
|
If the member is not found. |
ArchiveEncryptedError
|
If the member is encrypted and |
ArchiveCorruptedError
|
If the compressed data is corrupted. |
ValueError
|
If the archive was opened in streaming mode. |
extractall(path=None, members=None, *, pwd=None, filter=None)
abstractmethod
Extract all (or selected) members to a given directory.
If the archive was opened in streaming mode, this method can only be called once.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str | PathLike | None
|
Target directory. Defaults to the current working directory if |
None
|
members
|
Collection[ArchiveMember | str] | Callable[[ArchiveMember], bool] | None
|
Optional. A collection of member names or |
None
|
pwd
|
bytes | str | None
|
Optional password to use for encrypted members, if needed; by default, the password passed when opening the archive is used. |
None
|
filter
|
ExtractFilterFunc | ExtractionFilter | None
|
Optional filter or sanitizer applied to each member. Either
a predefined |
None
|
Returns:
Type | Description |
---|---|
dict[str, ArchiveMember]
|
A mapping from extracted file paths (including the target directory) to |
dict[str, ArchiveMember]
|
their corresponding |
Raises:
Type | Description |
---|---|
ArchiveEncryptedError
|
If a member is encrypted and |
ArchiveCorruptedError
|
If the archive is corrupted. |
ArchiveIOError
|
If other I/O-related issues occur. |
SameFileError
|
If extraction would overwrite a file in the archive itself. |
get_archive_info()
abstractmethod
Return metadata about the archive as an ArchiveInfo object.
Includes format, solidity, comments, and other archive-level information.
Returns:
Type | Description |
---|---|
ArchiveInfo
|
An ArchiveInfo object. |
get_member(member_or_filename)
abstractmethod
Return an ArchiveMember for the given name or member.
If a filename (str) is provided, looks up the corresponding member. If an ArchiveMember is provided, it is returned as-is after validating that it belongs to this archive. This is useful when accepting either form in a user-facing API.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
member_or_filename
|
ArchiveMember | str
|
A filename or an existing ArchiveMember. |
required |
Returns:
Type | Description |
---|---|
ArchiveMember
|
The corresponding ArchiveMember. |
Raises:
Type | Description |
---|---|
ArchiveMemberNotFoundError
|
If the name does not match any member. |
get_members()
abstractmethod
Return a list of all members in the archive.
For some formats (e.g. TAR), this may require reading the entire archive if no central directory is available. Always raises ValueError in streaming mode to avoid misuse.
Returns:
Type | Description |
---|---|
List[ArchiveMember]
|
A list of ArchiveMember objects. |
Raises:
Type | Description |
---|---|
ArchiveError
|
If member metadata cannot be read. |
ValueError
|
If the archive was opened in streaming mode. |
get_members_if_available()
abstractmethod
Return a list of members if available without full archive traversal.
For formats with a central directory (e.g. ZIP), this is typically fast. Returns None if not readily available (e.g. TAR streams).
Returns:
Type | Description |
---|---|
List[ArchiveMember] | None
|
A list of ArchiveMember objects, or None if unavailable. |
has_random_access()
abstractmethod
Return True
if this archive supports random access to its members.
Random access allows methods like open()
, get_members()
, and extract()
to
be used freely. This returns False
if the archive was opened in streaming
mode, in which case only a single pass through iter_members_with_streams()
or
extractall()
is supported.
supported.
Random access allows methods like open()
, get_members()
, and extract()
to
work reliably. Returns False
if the archive was opened from a non-seekable
source (e.g. a streamed .tar
file), in which case only a single pass through
iter_members_with_streams()
is allowed.
Returns:
Type | Description |
---|---|
bool
|
|
iter_members_with_streams(members=None, *, pwd=None, filter=None)
abstractmethod
Iterate over archive members, yielding each with a readable stream if applicable.
For each member, this yields a tuple (ArchiveMember, stream)
. The stream
is
a binary file-like object for regular files, and None
for non-file members.
If the archive was opened in streaming mode, this method can only be called once.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
members
|
Collection[ArchiveMember | str] | Callable[[ArchiveMember], bool] | None
|
A collection of |
None
|
pwd
|
bytes | str | None
|
Optional password to use for encrypted members, if needed; by default, the password passed when opening the archive is used. |
None
|
filter
|
IteratorFilterFunc | ExtractionFilter | None
|
Optional filter or sanitizer applied to each member. Either
a predefined |
None
|
Yields:
Type | Description |
---|---|
ArchiveMember
|
Tuples of |
BinaryIO | None
|
For file members, the stream allows reading their content. For non-file |
tuple[ArchiveMember, BinaryIO | None]
|
members (e.g. directories or links), the stream is |
tuple[ArchiveMember, BinaryIO | None]
|
Streams are lazily opened only if accessed, so skipping unused members |
tuple[ArchiveMember, BinaryIO | None]
|
is efficient. Each stream is automatically closed when iteration advances |
tuple[ArchiveMember, BinaryIO | None]
|
to the next member or when the generator is closed. |
Raises:
Type | Description |
---|---|
ArchiveEncryptedError
|
If a member is encrypted and |
ArchiveCorruptedError
|
If member data is found to be corrupted. (may be raised when retrieving the next item, or when attempting to read a returned stream) |
ArchiveIOError
|
If other I/O-related errors occur. |
open(member_or_filename, *, pwd=None)
abstractmethod
Open a specific member for reading and return a binary stream.
Accepts either a filename (str) or an ArchiveMember. Filenames are resolved to members automatically. For symlinks, this returns the target file’s content.
Requires random access support (see has_random_access()
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
member_or_filename
|
ArchiveMember | str
|
The member or its filename. |
required |
pwd
|
bytes | str | None
|
Optional password to use for encrypted members, if needed. By default, the password passed when opening the archive is used. |
None
|
Returns:
Type | Description |
---|---|
BinaryIO
|
A binary stream for reading the member's content. |
Raises:
Type | Description |
---|---|
ArchiveMemberNotFoundError
|
If the member is not found. |
ArchiveMemberCannotBeOpenedError
|
If the member is not a file or a link that points to a file. |
ArchiveEncryptedError
|
If the member is encrypted and |
ArchiveCorruptedError
|
If the compressed data is corrupted. |
ValueError
|
If the archive was opened in streaming mode. |
resolve_link(member)
abstractmethod
Resolve a link member to its final non-link target.
If the input is not a link, returns the member itself. For symlinks or hardlinks,
follows the chain to the real target. If the link points to a file that is not
in the archive, returns None
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
member
|
ArchiveMember
|
The ArchiveMember to resolve. |
required |
Returns:
Type | Description |
---|---|
ArchiveMember | None
|
The resolved ArchiveMember, or None if resolution fails. |
ArchiveInfo
dataclass
Metadata about the archive format and container-level properties.
ArchiveMember
dataclass
Represents a file within an archive.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mtime
|
Optional[datetime]
|
(computed property) Returns |
required |
member_id
|
int
|
(computed property) Unique ID for this member within the archive. Values are assigned in archive order and can be used to disambiguate identical filenames or preserve ordering. |
required |
archive_id
|
str
|
(computed property) Unique ID for the archive this member belongs to. |
required |
date_time
|
Optional[Tuple[int, int, int, int, int, int]]
|
(computed property) (year, month, day, hour, minute, second) tuple for |
required |
is_file
|
bool
|
(computed property) Convenience property returning |
required |
is_dir
|
bool
|
(computed property) Convenience property returning |
required |
is_link
|
bool
|
(computed property) Convenience property returning |
required |
is_other
|
bool
|
(computed property) Convenience property returning |
required |
CRC
|
Optional[int]
|
(computed property) Alias for |
required |
replace(**kwargs)
Return a copy of this member with selected fields updated.
Used primarily by extraction filters to modify metadata without mutating the original object.
ArchiveFormat
Bases: StrEnum
Supported archive and compression formats.
MemberType
Bases: StrEnum
ExtractionFilter
Bases: StrEnum
Built-in sanitization policies for archive extraction.
These match Python's built-in tarfile
named filters,
and can be used to block unsafe paths, strip permissions, or restrict file types.
DATA = 'data'
class-attribute
instance-attribute
Stricter than 'tar': also blocks special files and unsafe links, and removes executable bits from regular files.
FULLY_TRUSTED = 'fully_trusted'
class-attribute
instance-attribute
No filtering or restrictions. Use only with fully trusted archives.
TAR = 'tar'
class-attribute
instance-attribute
Blocks absolute paths and files outside destination; strips setuid/setgid/sticky bits and group/other write permissions.
ArchiveyConfig
dataclass
Configuration for :func:archivey.open_archive
.
extraction_filter = ExtractionFilter.DATA
class-attribute
instance-attribute
A filter function that can be used to filter members when iterating over an archive. It can be a function that takes an ArchiveMember and returns a possibly-modified ArchiveMember object, or None to skip the member.
overwrite_mode = OverwriteMode.ERROR
class-attribute
instance-attribute
What to do with existing files when extracting. OVERWRITE: overwrite existing files. SKIP: skip existing files. ERROR: raise an error if a file already exists, and stop extracting.
tar_check_integrity = True
class-attribute
instance-attribute
If a tar archive is corrupted in a metadata section, tarfile simply stops reading further and acts as if the file has ended. If set, we perform a check that the tar archive has actually been read fully, and raise an error if it's actually corrupted.
use_indexed_bzip2 = False
class-attribute
instance-attribute
Alternative library that can be used instead of the builtin bzip2 module to read bzip2 streams. Provides multithreaded decompression and random access support.
use_python_xz = False
class-attribute
instance-attribute
Alternative library that can be used instead of the builtin xz module to read xz streams. Provides random access support.
use_rapidgzip = False
class-attribute
instance-attribute
Alternative library that can be used instead of the builtin gzip module to read gzip streams. Provides multithreaded decompression and random access support (i.e. jumping to arbitrary positions in the stream without re-decompressing the entire stream), which is particularly useful for accessing random members in compressed tar files.
use_rar_stream = False
class-attribute
instance-attribute
If set, use an alternative approach instead of calling rarfile when iterating over RAR archive members. This supports decompressing multiple members in a solid archive by going through the archive only once, instead of once per member.
use_single_file_stored_metadata = False
class-attribute
instance-attribute
If set, data stored in compressed stream headers is set in the ArchiveMember object for single-file compressed archives, instead of basing it only on the file itself. (filename and modification time for gzip archives only)
use_zstandard = False
class-attribute
instance-attribute
An alternative to pyzstd. Not as good at error reporting.
archivey_config(config=None, **overrides)
get_archivey_config()
Return the current default configuration.
set_archivey_config(config)
Set the default configuration for :func:open_archive
and :func:open_compressed_stream
.
ArchiveError
Bases: Exception
Base exception for all archive-related errors raised by Archivey.
archivey.types
Common types and enums used internally by Archivey.
Most public types are exposed through the archivey
module, but advanced or
format-specific types can be imported from here as needed.
ArchiveFormat
Bases: StrEnum
Supported archive and compression formats.
Source code in src/archivey/types.py
CreateSystem
Bases: IntEnum
Operating system that created the archive member, if known.
These values match the create_system
field from the ZIP specification
and the Python zipfile
module. Other formats may report compatible values
where applicable.
Source code in src/archivey/types.py
ArchiveInfo
dataclass
Metadata about the archive format and container-level properties.
Source code in src/archivey/types.py
ArchiveMember
dataclass
Represents a file within an archive.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mtime
|
Optional[datetime]
|
(computed property) Returns |
required |
member_id
|
int
|
(computed property) Unique ID for this member within the archive. Values are assigned in archive order and can be used to disambiguate identical filenames or preserve ordering. |
required |
archive_id
|
str
|
(computed property) Unique ID for the archive this member belongs to. |
required |
date_time
|
Optional[Tuple[int, int, int, int, int, int]]
|
(computed property) (year, month, day, hour, minute, second) tuple for |
required |
is_file
|
bool
|
(computed property) Convenience property returning |
required |
is_dir
|
bool
|
(computed property) Convenience property returning |
required |
is_link
|
bool
|
(computed property) Convenience property returning |
required |
is_other
|
bool
|
(computed property) Convenience property returning |
required |
CRC
|
Optional[int]
|
(computed property) Alias for |
required |
Source code in src/archivey/types.py
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 |
|
replace(**kwargs)
Return a copy of this member with selected fields updated.
Used primarily by extraction filters to modify metadata without mutating the original object.
Source code in src/archivey/types.py
FilterFunc
Bases: Protocol
A callable that takes a member and its destination path, and returns a modified
member or None
to skip it during extraction or iteration.
Source code in src/archivey/types.py
ExtractionFilter
Bases: StrEnum
Built-in sanitization policies for archive extraction.
These match Python's built-in tarfile
named filters,
and can be used to block unsafe paths, strip permissions, or restrict file types.
Source code in src/archivey/types.py
FULLY_TRUSTED = 'fully_trusted'
class-attribute
instance-attribute
No filtering or restrictions. Use only with fully trusted archives.
TAR = 'tar'
class-attribute
instance-attribute
Blocks absolute paths and files outside destination; strips setuid/setgid/sticky bits and group/other write permissions.
DATA = 'data'
class-attribute
instance-attribute
Stricter than 'tar': also blocks special files and unsafe links, and removes executable bits from regular files.
ReadableBinaryStream
ReadableStreamLikeOrSimilar = ReadableBinaryStream | io.IOBase | IO[bytes]
module-attribute
A readable binary stream or similar object (e.g. IO[bytes]).
archivey.exceptions
Custom exceptions raised by Archivey.
The base ArchiveError
can be accessed from the archivey
module. More specific
subtypes are defined here to allow fine-grained error handling when needed.
ArchiveError
Bases: Exception
Base exception for all archive-related errors raised by Archivey.
ArchiveReadError
ArchiveUnsupportedFeatureError
ArchiveCorruptedError
ArchiveEOFError
Bases: ArchiveCorruptedError
Raised when an unexpected end-of-file is encountered while reading an archive.
ArchiveStreamNotSeekableError
Bases: ArchiveReadError
Raised when a non-seekable stream is passed to open_archive()
or
open_compressed_stream()
, but the archive format or backend library
requires a seekable input stream.
ArchiveMemberError
ArchiveMemberNotFoundError
ArchiveMemberCannotBeOpenedError
Bases: ArchiveMemberError
Raised when a member cannot be opened for reading, typically because it's a directory, special file, or unresolved link.
ArchiveLinkTargetNotFoundError
Bases: ArchiveMemberError
Raised when a symbolic or hard link within the archive points to a target that cannot be found within the same archive.
ArchiveExtractionError
ArchiveFileExistsError
Bases: ArchiveExtractionError
Raised during extraction if a file to be written already exists and the overwrite mode prevents overwriting it.
ArchiveEncryptedError
Bases: ArchiveError
Raised when an archive or member is encrypted and either no password was provided, or the provided password is incorrect.
ArchiveFilterError
ArchiveNotSupportedError
PackageNotInstalledError
Bases: ArchiveError
Raised when a required third-party library or package for handling a specific archive format is not installed in the environment.
archivey.filters
Custom filter functions for Archivey.
You don't need to use this package if you just want to use the default filters.
Just pass one of the :ref:archivey.ExtractionFilter
values to the
iter_members_with_streams
or extractall
methods, or set it in the
:ref:archivey.ArchiveyConfig.extraction_filter
field.
If you need a filter with custom options, you can use the create_filter
function. Or you can create your own filter function by implementing the
:ref:archivey.FilterFunc
type.
create_filter(*, for_data, sanitize_names, sanitize_link_targets, sanitize_permissions, raise_on_error)
Create a filter function with the given options.
The filter function can be passed to iter_members_with_streams
or extractall
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
for_data
|
bool
|
Whether the filter is for data members (files and directories). |
required |
sanitize_names
|
bool
|
Whether to sanitize the names of members. |
required |
sanitize_link_targets
|
bool
|
Whether to sanitize the link targets of members. |
required |
sanitize_permissions
|
bool
|
Whether to sanitize the permissions of members. |
required |
raise_on_error
|
bool
|
Whether to raise an error if a filter function returns None. |
required |