wget, Wget2 and Modern File Downloading

A comprehensive guide from wget fundamentals to Wget2's multi-threaded architecture, HTTP/2 support to TLS 1.3 security, bot protection strategies to curl comparison and Python subprocess integration.

Ceyhun Enki Aksan
Ceyhun Enki Aksan Entrepreneur, Maker

TL;DR

Topic20192026
Versionwget 1.20.xWget2 2.2.1 / wget 1.25.0
ProtocolHTTP/1.1HTTP/2 (Wget2)
DownloadingSingle-threadedMulti-threaded (5 threads)
Compressiongzip, deflatebrotli, zstd, lzip, xz
TLSTLS 1.2 commonTLS 1.3 mandatory
HSTSBasicFull RFC 6797 (Wget2)
Bot ProtectionUser-Agent sufficientJA3 fingerprint, JS challenge

The first version of this article was published in 2019. Since then, the transformation at the internet’s protocol level (HTTP/2, TLS 1.3), developments in the bot protection ecosystem, and Wget2 reaching stability have turned wget from merely a “file downloading tool” into an endpoint that must adapt to modern web architecture.

wget Fundamentals

GNU Wget is a download manager that enables file downloading from the internet via the command line. Its name is derived from World Wide Web and get. It can perform download operations over HTTP, HTTPS, and FTP protocols, offering fundamental capabilities such as recursive site downloading, resuming interrupted downloads, and bandwidth limiting1.

Its most basic usage:

wget [URL]

Commonly Used Parameters

ParameterDescriptionExample
-OSpecify output filenamewget -O data.zip [URL]
-cResume interrupted downloadwget -c [URL]
-rRecursive downloadwget -r [URL]
-mSite mirroringwget -m [URL]
-PSpecify target directorywget -P /downloads/ [URL]
-iDownload from file listwget -i list.txt
-qQuiet mode (no output)wget -q [URL]
--limit-rateLimit bandwidthwget --limit-rate=500k [URL]
-ADownload only specific typeswget -r -A jpg,png [URL]
-RExclude specific typeswget -r -R tar.gz [URL]
--user-agentSpecify User-Agent stringwget --user-agent="..." [URL]
-npDo not ascend to parent directorywget -r -np [URL]

Site Mirroring

One of wget’s most powerful features is the ability to create an offline copy of a website:

wget --mirror --convert-links --page-requisites --no-parent \
  -P ./site-mirror https://example.com/documentation/

The --convert-links parameter converts links in downloaded HTML files to local paths, enabling offline viewing. This capability, which curl does not offer, makes wget indispensable in certain scenarios.


Wget2: The Silent Revolution

The most significant development in the wget ecosystem since 2019 has been the stabilization of Wget2, the successor to the original wget. With its latest version 2.2.1 (January 2026), it continues active development2.

wget vs Wget2

Featurewget (1.x)Wget2 (2.x)
HTTP/2Not supportedFull support (nghttp2)
Multi-threaded downloadingNoneDefault 5 threads
Compressiongzip, deflatebrotli, zstd, lzip, bzip2, xz
HSTSBasicFull RFC 6797, on by default
HPKPNoneRFC 7469 (persistent database)
TCP Fast OpenNoneSupported
TLS Session ResumptionNoneWith persistent cache
Single-file chunked downloadNoneVia --chunk-size
RSS/Atom/SitemapNoneSupported
FTPFull supportLimited
WARC outputSupportedNot yet

Multi-threaded Downloading

Wget2’s most prominent advantage is multi-threaded download support. While the original wget downloads files through a single channel, Wget2 uses 5 concurrent connections by default:

# Parallel download (default 5 threads)
wget2 https://example.com/large-file.tar.gz

# Download single file in chunks
wget2 --chunk-size=10M https://example.com/large-file.tar.gz

The --chunk-size parameter splits a single large file into specified-size chunks for parallel downloading. This feature provides significant performance gains on high-bandwidth connections.

HTTP/2 Support

Wget2 offers full HTTP/2 support via nghttp2 and GnuTLS ALPN. Thanks to HTTP/2’s multiplexing capability, multiple requests can be sent simultaneously over a single TCP connection:

# HTTP/2 is used automatically (if server supports it)
wget2 https://example.com/file.zip

HTTP/3 (QUIC) support exists in Wget2’s development branch but has not yet been included in a stable release.

Compression Support

Wget2 supports modern compression algorithms including brotli and zstd. The Accept-Encoding header automatically advertises supported algorithms:

# Supported algorithms: brotli, zstd, lzip, bzip2, xz, gzip, deflate
wget2 --compression=zstd https://example.com/data.json

Distribution Adoption

Fedora 40 and later ships Wget2 as the default wget command via the wget2-wget package3. This is the strongest signal that Wget2 is production-ready. It is also available in Debian, Ubuntu, Alpine, and Arch Linux repositories:

# Fedora (ships by default)
dnf install wget2-wget

# Debian/Ubuntu
apt install wget2

# macOS
brew install wget2

Security and Modern TLS

While TLS 1.1 and 1.2 were common in 2019, TLS 1.3 has become the standard as of 2026. This change directly affects the wget ecosystem.

TLS 1.3 Support

wget has supported TLS 1.3 via GnuTLS since version 1.19.5. Command-line TLS version selection became possible with wget 1.21.3. Wget2 offers more advanced security features with TLS False Start and persistent-cache TLS Session Resumption4.

HSTS (HTTP Strict Transport Security)

Wget2 enables full RFC 6797 HSTS compliance by default and stores HSTS information in a persistent database. When a site issues a “call me only over HTTPS” directive, subsequent requests are automatically redirected to HTTPS.

--no-check-certificate Is Now Bad Practice

With the proliferation of Let’s Encrypt, the number of servers without valid TLS certificates has decreased significantly. --no-check-certificate completely disables certificate validation, opening the door to man-in-the-middle (MITM) attacks:

# Bad practice
wget --no-check-certificate https://example.com/file.zip

# Correct approach: Update system certificates
sudo update-ca-certificates

# Or specify a custom CA file
wget --ca-certificate=/path/ca-bundle.crt https://example.com/file.zip

Security Vulnerabilities

CVECVSSAffectedDescription
CVE-2024-384289.1 (Critical)wget 1.24.5 and earlierSemicolons in URI parsing misinterpreted as host separator, enabling SSRF5
CVE-2024-10524Mediumwget 1.24.5 and earlierHTTP shorthand SSRF vulnerability, fixed in 1.25.0
CVE-2025-691948.8 (High)Wget2Metalink file overwrite vulnerability, fixed in 2.2.16

wget 1.25.0 (November 2024) patches these vulnerabilities while also introducing a breaking change: the shorthand URL format (wget user:password@server) has been removed; full URL specification is now mandatory.


Understanding Bot Protections

E-commerce sites and data sources use increasingly aggressive protections to detect tools like wget. Modern bot detection systems operate at four layers7:

1. User-Agent Analysis

wget sends Wget/1.25.0 (or Wget2/2.2.1) by default. This string is instantly recognized by WAF systems like Cloudflare and Akamai:

# High risk of blocking
wget https://example.com/data.csv

# With modern browser string
wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0" \
  https://example.com/data.csv

2. TLS/JA3 Fingerprint

Modern bot protection systems create a unique “JA3 fingerprint” from cipher suites, extensions, and TLS version sent during the TLS handshake. wget and curl have fingerprints distinct from browsers, and anti-bot services maintain databases of these fingerprints.

3. HTTP/2 Frame Ordering

Even tools supporting HTTP/2 (like curl, httpx) can be fingerprinted via HTTP/2 frame ordering and settings.

4. JavaScript Challenge

Since wget cannot execute JavaScript, JS-based protections like Cloudflare Turnstile or Akamai sensor data block wget entirely.

Human-like Behavior Strategy

wget --continue --tries=10 \
  --wait=2 --random-wait \
  --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0" \
  --limit-rate=500k \
  [URL]

The --random-wait parameter uses a random value between 0.5x and 1.5x of the time specified with --wait, making request timing more human-like. --limit-rate throttles bandwidth to avoid overwhelming the target server.

robots.txt: wget respects robots.txt by default during recursive downloads. For analysis within ethical boundaries, it can be disabled with -e robots=off, but this approach requires responsible usage.


wget vs curl: 2026 Perspective

In the developer world, wget and curl are frequently compared, but the two tools are built on different philosophies8.

Protocol Support

Protocolcurlwget (1.x)Wget2
HTTP/1.xYesYesYes
HTTP/2YesNoYes
HTTP/3 (QUIC)ExperimentalNoIn development
FTP/FTPSYesYesLimited
SCP/SFTPYesNoNo
SMTP/POP3/IMAPYesNoNo
Total protocols~263~3

Philosophical Difference

curl is a pipe-oriented tool like cat: it writes data to stdout and chains with other tools. wget is file-copy oriented like cp: it writes files to disk, resumes on interruption, and preserves directory structure.

Behind curl is the libcurl library, used by thousands of applications. Wget2’s libwget library has not yet reached this level of adoption.

Which Tool for Which Scenario?

Scenariowget / Wget2curl
Recursive site downloading--mirror, --convert-linksNot supported
Offline site copy--mirror -p --convert-linksNot supported
Resumable large files-c (continue)--continue-at -
API testingLimitedFull support (all HTTP methods)
JSON send/receiveDifficult--json, -d
Pipeline integrationPossible with -O -Default (stdout)
Multi-protocolHTTP/FTP26 protocols
List-based downloading-i list.txtNot supported

In short: wget for file downloading and site mirroring, curl for API interaction and pipeline operations.


Modern Alternatives

aria2

aria2 is a lightweight download manager offering multi-protocol (HTTP, FTP, SFTP, BitTorrent, Metalink) and multi-source download support9. Latest version 1.37.0 (November 2023). It can increase speed by downloading a single file from multiple sources simultaneously. Remote control is possible via JSON-RPC and XML-RPC interfaces:

# Multi-source download
aria2c -x 16 https://example.com/large-file.tar.gz

# BitTorrent
aria2c file.torrent

# RPC mode (control via web interface)
aria2c --enable-rpc

aria2 excels in pure download performance (especially multi-source and torrent), while wget2 holds the advantage in web crawling, site mirroring, and modern HTTP features (HTTP/2, brotli).

curl-impersonate

Developed to bypass modern bot protections, curl-impersonate is a fork of curl that mimics browser TLS fingerprints. It provides an effective solution against JA3-based detection.

Comparison Table

ToolHTTP/2Multi-threadedRecursiveBitTorrentCompression
wget 1.xNoNoYesNogzip
Wget2YesYes (5)YesNobrotli, zstd
curlYesNoNoNobrotli, zstd
aria2NoYes (16)NoYesNo

Python subprocess Integration

Calling wget from Python can be a meaningful approach in specific scenarios. However, native Python libraries should be preferred by default.

When Is subprocess with wget Meaningful?

Recursive site downloading: Python libraries cannot adequately replicate wget’s --mirror capability:

import subprocess

# Site mirroring
result = subprocess.run([
    "wget", "--mirror", "--convert-links",
    "--page-requisites", "--no-parent",
    "-P", "./site-mirror",
    "https://example.com/documentation/"
], capture_output=True, text=True, timeout=600)

Resumable large file transfer: wget’s -c parameter is a battle-tested mechanism:

import subprocess

result = subprocess.run([
    "wget", "-c", "--tries=10",
    "--limit-rate=500k",
    "-O", "data.tar.gz",
    "https://example.com/large-data.tar.gz"
], capture_output=True, text=True, timeout=3600)

When Should Native Python Libraries Be Preferred?

For scenarios requiring API interaction, async request management, and direct response data processing, native libraries are more suitable:

import httpx

# HTTP/2 enabled async request
async with httpx.AsyncClient(http2=True) as client:
    response = await client.get("https://api.example.com/data")
    data = response.json()

Library Comparison

LibraryAsyncHTTP/2Best Scenario
requestsNoNoSimple scripts, quick automation
httpxYesYesModern async code, HTTP/2
aiohttpYesNoHigh-concurrency workloads
subprocess+wgetNoNo (wget1), Yes (wget2)Recursive downloads, site mirroring
subprocess+aria2Via RPCNoMulti-source, torrents

Pipeline Approach

When you want to feed files downloaded with wget directly into a data processing pipeline, wrapping with subprocess makes sense:

import subprocess
import pandas as pd
from pathlib import Path

def download_and_process(url: str, target: Path) -> pd.DataFrame:
    """Download with wget, then process with pandas."""
    result = subprocess.run(
        ["wget", "-q", "-O", str(target), url],
        capture_output=True, text=True, timeout=120
    )
    if result.returncode != 0:
        raise RuntimeError(f"wget error: {result.stderr}")
    return pd.read_csv(target)

# Usage
df = download_and_process(
    "https://data.example.com/dataset.csv",
    Path("/tmp/dataset.csv")
)

This approach combines wget’s resumable downloading and bandwidth control capabilities with Python’s data processing power.


2026 Best Practice Guide

Modern wget Command

wget --continue --tries=10 \
  --wait=2 --random-wait \
  --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0" \
  --limit-rate=500k \
  [URL]

Modern Downloading with Wget2

wget2 --chunk-size=10M --progress=bar \
  https://example.com/large-file.tar.gz

Secure Site Mirroring

wget --mirror --convert-links --page-requisites \
  --no-parent --wait=1 --random-wait \
  --user-agent="Mozilla/5.0 Chrome/120.0.0.0" \
  -P ./site-mirror https://example.com/

Security Checklist

# Version check
wget --version | head -1
# Prefer wget2 if available
wget2 --version 2>/dev/null && echo "wget2 available"

# Do NOT use --no-check-certificate
# Instead, update system certificates
sudo update-ca-certificates

Conclusion

wget maintains its position as the GNU ecosystem’s fundamental downloading tool. However, the requirements of modern web architecture (HTTP/2, TLS 1.3, bot protections) have exceeded the capabilities of the original wget. Wget2 fills this gap with multi-threaded downloading, brotli/zstd compression, and full HSTS support, proving its production readiness through Fedora’s adoption as the default wget.

In developer workflows, wget and curl are not alternatives but complements. wget should be used for site mirroring and large file downloads, curl for API interaction and pipeline operations. In the Python ecosystem, subprocess with wget integration stands out as a meaningful approach for recursive downloading and resumable transfer scenarios.

For related topics, see rsync, grep, crontab, and data scraping.

Footnotes

  1. GNU Wget
  2. GNU Wget2 2.2.1 Release
  3. Fedora Wget2asWget Change
  4. rsync 3.2.0 NEWS - TLS and compression reference
  5. CVE-2024-38428 - JFrog Analysis
  6. CVE-2025-69194 - Wget2 Metalink Vulnerability
  7. Cloudflare User Agent Blocking
  8. curl vs wget - Daniel Stenberg
  9. aria2 - Multi-protocol Download Utility