dateparser parses natural language date strings in many languages. pip install dateparser. Basic: import dateparser; dateparser.parse("yesterday") → datetime. dateparser.parse("next Friday"). dateparser.parse("3 days ago"). dateparser.parse("Jan 5 2024"). dateparser.parse("5 janvier 2024") — French. dateparser.parse("今天") — Chinese “today”. Settings: dateparser.parse("05/03/24", settings={"PREFER_DAY_OF_MONTH":"first","PREFER_DATES_FROM":"future"}). PREFER_DATES_FROM: “past” “future” “current_period”. DATE_ORDER: “DMY” “MDY” “YMD”. RETURN_TIME_AS_PERIOD: “time” returns (start, end) tuple. TIMEZONE: settings={"TIMEZONE":"US/Eastern"} — localize to tz. RETURN_AS_TIMEZONE_AWARE: settings={"RETURN_AS_TIMEZONE_AWARE":True}. RELATIVE_BASE: settings={"RELATIVE_BASE":datetime(2024,1,1)} — anchor for “yesterday”/“tomorrow”. STRICT_PARSING: settings={"STRICT_PARSING":True} — reject ambiguous. PREFER_LOCALE_DATE_ORDER: use system locale for day/month order. Search: from dateparser.search import search_dates. search_dates("Meeting on Jan 5 at 2pm and follow-up March 10") → list of (string, datetime). Languages: settings={"LANGUAGES":["de","fr","es"]} — restrict parsing to specific languages. Speed: from dateparser.search import search_dates. Pandas: df["date"] = df["date_str"].map(dateparser.parse). Claude Code generates dateparser extractors, pandas date parsers, and multilingual date normalizers.
CLAUDE.md for dateparser
## dateparser Stack
- Version: dateparser >= 1.2 | pip install dateparser
- Parse: dateparser.parse("3 days ago") | dateparser.parse("next Monday") | parse("Jan 5")
- Settings: {"PREFER_DATES_FROM":"future"} | {"DATE_ORDER":"DMY"} | {"TIMEZONE":"UTC"}
- Timezone: {"RETURN_AS_TIMEZONE_AWARE":True, "TIMEZONE":"US/Eastern"}
- Relative base: {"RELATIVE_BASE":datetime(2024,1,1)} — anchor "yesterday"/"next week"
- Search: search_dates(text) → list of (string, datetime) from free text
- Strict: {"STRICT_PARSING":True} — raise/return None for ambiguous strings
dateparser Natural Language Date Pipeline
# app/date_parse.py — dateparser natural language date parsing and extraction
from __future__ import annotations
import re
from datetime import datetime, timezone
from typing import Any
import dateparser
from dateparser.search import search_dates
# ─────────────────────────────────────────────────────────────────────────────
# 1. Basic parsing helpers
# ─────────────────────────────────────────────────────────────────────────────
def parse_date(
text: str,
prefer: str = "past",
tz: str | None = None,
date_order: str = "MDY",
base: datetime | None = None,
strict: bool = False,
) -> datetime | None:
"""
Parse a natural language date string.
prefer="past"/"future"/"current_period" — which direction for ambiguous relative dates.
tz: IANA timezone string ("America/New_York") for timezone-aware output.
date_order: "MDY" (US), "DMY" (European), "YMD" (ISO).
base: anchor for relative dates like "yesterday" (defaults to now).
strict=True: return None for ambiguous strings instead of guessing.
"""
settings: dict[str, Any] = {
"PREFER_DATES_FROM": prefer,
"DATE_ORDER": date_order,
"RETURN_AS_TIMEZONE_AWARE": bool(tz),
"STRICT_PARSING": strict,
}
if tz:
settings["TIMEZONE"] = tz
if base:
settings["RELATIVE_BASE"] = base
return dateparser.parse(text, settings=settings)
def parse_date_utc(text: str) -> datetime | None:
"""Parse a date string and return a UTC-aware datetime."""
return parse_date(text, tz="UTC", prefer="past")
def parse_date_strict(text: str, date_order: str = "MDY") -> datetime | None:
"""
Strict parsing — returns None for ambiguous strings that could be multiple dates.
Use when you need to reject uncertain parses.
"""
return dateparser.parse(text, settings={
"STRICT_PARSING": True,
"DATE_ORDER": date_order,
})
# ─────────────────────────────────────────────────────────────────────────────
# 2. Multilingual parsing
# ─────────────────────────────────────────────────────────────────────────────
def parse_multilingual(text: str, languages: list[str] | None = None) -> datetime | None:
"""
Parse a date in any of the specified languages.
dateparser detects language automatically if languages=None.
Restrict to a list for better accuracy when you know the input language.
Examples:
"5 janvier 2024" → French
"5. Januar 2024" → German
"5 de enero de 2024" → Spanish
"5 января 2024" → Russian
"2024年1月5日" → Japanese
"""
settings: dict[str, Any] = {}
if languages:
settings["LANGUAGES"] = languages
return dateparser.parse(text, settings=settings)
def parse_eu_date(text: str) -> datetime | None:
"""Parse a European date string (day-first order: 5/3/2024 = March 5th)."""
return dateparser.parse(text, settings={"DATE_ORDER": "DMY"})
def parse_iso_date(text: str) -> datetime | None:
"""Parse an ISO/Asian date string (year-first: 2024/1/5)."""
return dateparser.parse(text, settings={"DATE_ORDER": "YMD"})
# ─────────────────────────────────────────────────────────────────────────────
# 3. Relative date parsing with custom base
# ─────────────────────────────────────────────────────────────────────────────
def parse_relative(text: str, base: datetime) -> datetime | None:
"""
Parse a relative date expression anchored to a specific base datetime.
Essential for reproducible tests and for processing historical documents
where "next week" should be relative to the document date, not today.
"""
return dateparser.parse(text, settings={
"RELATIVE_BASE": base,
"PREFER_DATES_FROM": "future",
"RETURN_AS_TIMEZONE_AWARE": False,
})
def relative_dates_from_base(
expressions: list[str],
base: datetime,
) -> list[datetime | None]:
"""Parse a list of relative date expressions from the same base."""
return [parse_relative(expr, base) for expr in expressions]
# ─────────────────────────────────────────────────────────────────────────────
# 4. Search — extract multiple dates from free text
# ─────────────────────────────────────────────────────────────────────────────
def find_dates_in_text(
text: str,
languages: list[str] | None = None,
add_detected_language: bool = False,
) -> list[dict[str, Any]]:
"""
Extract all date references from free text.
search_dates() returns [(matched_string, datetime), ...].
Useful for parsing meeting notes, emails, and documents.
"""
settings: dict[str, Any] = {}
if languages:
settings["LANGUAGES"] = languages
results = search_dates(
text,
settings=settings,
add_detected_language=add_detected_language,
) or []
if add_detected_language:
return [
{"text": r[0], "date": r[1], "language": r[2]}
for r in results
]
return [{"text": r[0], "date": r[1]} for r in results]
def extract_first_date(text: str) -> datetime | None:
"""Extract the first date found in a block of text."""
found = find_dates_in_text(text)
return found[0]["date"] if found else None
def extract_date_range(text: str) -> tuple[datetime | None, datetime | None]:
"""
Extract the first two dates from text as (start, end).
Useful for "between Jan 5 and Jan 10" or "from Monday to Friday".
"""
found = find_dates_in_text(text)
dates = [f["date"] for f in found if f["date"]]
start = dates[0] if len(dates) > 0 else None
end = dates[1] if len(dates) > 1 else None
return start, end
# ─────────────────────────────────────────────────────────────────────────────
# 5. Batch / pandas integration
# ─────────────────────────────────────────────────────────────────────────────
def parse_series(
values: list[str | None],
prefer: str = "past",
date_order: str = "MDY",
tz: str | None = None,
) -> list[datetime | None]:
"""
Parse a list of date strings.
For pandas: df["parsed"] = parse_series(df["date_str"].tolist())
"""
settings: dict[str, Any] = {
"PREFER_DATES_FROM": prefer,
"DATE_ORDER": date_order,
}
if tz:
settings["TIMEZONE"] = tz
settings["RETURN_AS_TIMEZONE_AWARE"] = True
results = []
for v in values:
if not v or not str(v).strip():
results.append(None)
continue
try:
results.append(dateparser.parse(str(v), settings=settings))
except Exception:
results.append(None)
return results
def normalize_dates_dataframe(df, column: str, new_column: str | None = None, **kwargs):
"""
Parse a pandas DataFrame column of mixed date strings into datetime.
new_column=None overwrites the source column.
"""
import pandas as pd
out_col = new_column or column
df[out_col] = parse_series(df[column].tolist(), **kwargs)
return df
# ─────────────────────────────────────────────────────────────────────────────
# 6. Validation and normalization
# ─────────────────────────────────────────────────────────────────────────────
def is_valid_date_string(text: str) -> bool:
"""Return True if the string can be parsed as a date."""
return parse_date_strict(text) is not None
def normalize_to_iso(text: str, prefer: str = "past") -> str | None:
"""Parse a date string and return ISO 8601 format (YYYY-MM-DD) or None."""
dt = parse_date(text, prefer=prefer)
return dt.strftime("%Y-%m-%d") if dt else None
def normalize_to_timestamp(text: str) -> float | None:
"""Parse a date string and return a Unix timestamp or None."""
dt = parse_date_utc(text)
return dt.timestamp() if dt else None
# ─────────────────────────────────────────────────────────────────────────────
# Demo
# ─────────────────────────────────────────────────────────────────────────────
if __name__ == "__main__":
print("=== Natural language parsing ===")
expressions = [
"yesterday",
"next Monday",
"3 days ago",
"in 2 weeks",
"last month",
"January 5, 2024",
"Jan 5 at 2:30pm",
"2024-01-05",
"5/3/2024",
"first day of next month",
]
for expr in expressions:
dt = parse_date(expr)
print(f" {expr!r:35} → {dt}")
print("\n=== Multilingual ===")
ml_samples = [
("5 janvier 2024", ["fr"]),
("5. Januar 2024", ["de"]),
("5 enero 2024", ["es"]),
("5 января 2024", ["ru"]),
]
for text, langs in ml_samples:
dt = parse_multilingual(text, langs)
print(f" {text!r:30} → {dt}")
print("\n=== Relative base ===")
base = datetime(2024, 3, 15)
relative_exprs = ["yesterday", "next week", "in 3 days", "last Friday"]
for expr in relative_exprs:
dt = parse_relative(expr, base)
print(f" Base 2024-03-15 + {expr!r:15} → {dt}")
print("\n=== Search in text ===")
email = """
Hi team — please note the quarterly review is on March 15, 2024.
Expenses are due by end of day February 28.
The next team sync is every Monday at 10am starting January 8.
"""
found = find_dates_in_text(email)
for f in found:
print(f" {f['text']!r:30} → {f['date']}")
print("\n=== ISO normalization ===")
for s in ["yesterday", "Jan 5 2024", "last Tuesday", "invalid text"]:
iso = normalize_to_iso(s)
print(f" {s!r:20} → {iso!r}")
print("\n=== Batch parsing ===")
dates = ["Jan 5 2024", "yesterday", "next Friday", None, "invalid", "2024-03-15"]
parsed = parse_series(dates)
for raw, dt in zip(dates, parsed):
print(f" {str(raw):20} → {dt}")
For the python-dateutil alternative — dateutil.parser.parse() handles ISO 8601 and many standard formats with ~95% coverage, and its fuzzy_with_tokens=True mode extracts dates from text; dateparser adds multilingual support ("5 janvier 2024", "5. Januar 2024", "5 de enero 2024" all produce the same datetime), relative expression parsing (“next Monday”, “3 days ago”, “last quarter”), search_dates() for extracting multiple dates from a paragraph, and the RELATIVE_BASE setting that anchors “yesterday” to a specific datetime instead of now — necessary for reproducible test fixtures and historical document processing. For the spaCy NER alternative — spaCy’s Named Entity Recognition can tag DATE and TIME entities in text, but it returns the raw matched string, not a datetime object; dateparser’s search_dates() returns (string, datetime) pairs without requiring a spaCy model download or pipeline setup, making it the right choice when the goal is datetime objects, not token classification. The Claude Skills 360 bundle includes dateparser skill sets covering dateparser.parse() with PREFER_DATES_FROM/DATE_ORDER/TIMEZONE settings, parse_date_utc() and parse_date_strict(), parse_multilingual() with LANGUAGES list, parse_eu_date() DMY and parse_iso_date() YMD order, parse_relative() with RELATIVE_BASE anchor, search_dates() text extraction, find_dates_in_text() structured output, extract_date_range() first/second date pair, parse_series() for batch list processing, normalize_dates_dataframe() pandas integration, normalize_to_iso() for YYYY-MM-DD output, and is_valid_date_string() validator. Start with the free tier to try natural language date parsing code generation.