Evidence Handling & Metadata for Investigators

Evidence handling in investigative research is the process of collecting, preserving, documenting, and storing digital findings in a manner that maintains their integrity, establishes their provenance, and ensures they remain usable if the investigation produces legal proceedings. Metadata is the embedded data layer within digital files that records how, when, where, and by whom a file was created or modified — information that is invisible in normal use but fully readable by anyone who examines the file’s properties. Investigators work with both simultaneously: collecting digital evidence that contains metadata about itself, while generating their own metadata through the tools and devices used to collect it. This guide covers how to handle digital evidence correctly from the moment of collection through storage and reporting, and how to manage metadata in both directions — stripping it from files before sharing, and reading it from files collected during an investigation.

Quick Answer: Digital evidence handling requires four controls applied at every stage — archive on first contact to establish a timestamped record, document the collection method and source, strip investigator-generated metadata before storing or sharing files, and store all findings in encrypted storage with a consistent naming and chain of custody record. Files that skip any of these steps may be challenged in legal proceedings or may inadvertently expose the investigator’s identity, device, or location.

⚠️ Legal Notice: This guide covers evidence handling for lawful investigative research only. Chain of custody procedures described here are general best practices and do not constitute legal advice. If investigation findings may be used in litigation, consult with qualified legal counsel regarding jurisdiction-specific evidentiary standards before collecting or presenting digital evidence.


Why Evidence Handling Matters in Digital Investigations

Digital evidence is inherently fragile. Web pages change or disappear. Social media profiles are deleted. Platform records are updated. A finding that existed at the moment of discovery may look different — or not exist at all — by the time it needs to be presented.

Digital evidence is also inherently traceable. Every file carries information about its origin. Every download leaves a record. Every screenshot contains device information. Investigators who collect evidence without controlling these traces create two problems simultaneously: findings that may not hold up to scrutiny, and files that may reveal who collected them and how.

The standard that applies to digital evidence in legal contexts is chain of custody — a documented record showing that evidence was collected at a specific time, by a specific person, using a specific method, and has not been altered since collection. Digital evidence that cannot meet this standard is vulnerable to challenges that have nothing to do with what the evidence actually shows.


Archiving: The First Step in Every Collection

The most important evidence handling decision happens at first contact with a finding. The correct action is to archive before anything else — before screenshots, before note-taking, before deeper investigation of the same page.

Archiving creates a timestamped, independently hosted snapshot of a page as it appeared at a specific moment. That snapshot becomes the evidentiary record. Everything that follows references the archive, not the live page.

archive.today

archive.today is the primary archiving tool for investigative use. It creates a permanent, timestamped snapshot of any publicly accessible page, stored on archive.today’s servers and accessible via a permanent URL. The snapshot captures the full HTML rendering of the page, including dynamic content, at the moment of submission.

For evidence handling purposes, archive.today provides three things a screenshot cannot: an independently hosted record not stored on the investigator’s device, a timestamp verified by a third-party server, and a URL that can be cited and independently verified by anyone reviewing the investigation record.

The submission process requires no account. The archive URL, the date and time of capture, and the live source URL should all be recorded in the investigation log at the moment of archiving.

Wayback Machine

The Internet Archive’s Wayback Machine serves a different function from archive.today. Rather than capturing current page states, it retrieves historical snapshots from its own crawl index — pages captured automatically over time, in some cases going back decades.

For evidence handling, the Wayback Machine is most useful for establishing what a page looked like before the investigation began — prior versions of a subject’s website, earlier versions of a social media profile, or a business listing as it appeared before a dispute arose. Wayback Machine captures are timestamped and independently hosted, meeting the same evidentiary standard as archive.today captures for historical content.

What Archiving Does Not Cover

Archiving captures publicly accessible pages. It does not capture content behind login walls, dynamically loaded content that requires interaction to display, or content that has already been deleted before the investigator’s first contact. For content of this nature, a contemporaneous screenshot with documented metadata is the closest available substitute — with the understanding that a screenshot is a weaker evidentiary record than an independently archived snapshot.


Screenshots: When and How

Screenshots are a secondary evidence collection method — appropriate when archiving is not possible, and as a supplement to archived records when additional detail is needed. A screenshot alone is the weakest form of digital evidence because it exists only on the investigator’s device, carries no independent timestamp, and can be altered without detection by anyone examining only the image file.

When screenshots are necessary, the following practices strengthen their evidentiary value.

Include the full browser window. A screenshot that shows only the content of a page, cropped to remove the browser chrome, provides no visible confirmation of the URL, the browser, or the date. A full-window screenshot showing the URL bar, the page content, and the system clock visible in the taskbar or menu bar provides more context and is harder to challenge.

Document immediately. Record the URL, the date, the time, and the platform in the investigation log at the moment the screenshot is taken — not later. Reconstruction from memory introduces inaccuracy.

Do not edit the screenshot. Annotating, cropping, or otherwise modifying a screenshot before it is stored creates a version of the file that is different from what the investigator originally captured. Store the original unmodified file. Annotations and highlights belong in the investigation report, not in the evidence file itself.

Strip device metadata before sharing. Screenshots taken on a smartphone or tablet embed EXIF data including the device model, timestamp, and in some cases GPS coordinates. Screenshots taken on a desktop or laptop embed OS version, screen resolution, and software information. This metadata should be stripped using ExifTool before the file is shared with a client, included in a report, or stored in any location accessible to parties outside the investigation.


Reading Metadata from Collected Files

Metadata is not only a liability to manage in files the investigator creates — it is also an investigative resource in files collected during research. Documents, images, and other files obtained from subjects, platforms, or third parties may contain metadata that reveals information not visible in the file’s content.

What to Read and Where to Find It

PDF files — Author name, creating organization, creating application, creation date, modification date, and in some cases prior authors from revision history. A PDF submitted as an official document from one party that shows a different author name or organization in its metadata than the stated source is a significant investigative finding.

Microsoft Word and Excel documents — Author name, company name, last modified by, revision count, total editing time, and the file path where the document was saved — which often includes the username of the account that created it. A document produced in litigation or submitted as evidence that shows unexpected authorship or editing history in its metadata may be relevant to authenticity questions.

JPEG and PNG images — Device model, software, creation timestamp, and GPS coordinates if location services were active on the device that created the image. Images submitted as evidence of presence at a location, or images from a subject’s social media that carry GPS data, may confirm or contradict stated facts.

Email headers — Not embedded file metadata, but a parallel concept. Email headers contain the originating IP address, the mail server path, timestamps at each relay, and the email client used. Headers are accessible through most email clients and provide source information that the visible email content does not.

ExifTool for Metadata Reading

ExifTool reads metadata from virtually all common file types and outputs it in a readable format.

Reading all metadata from a single file:

exiftool filename.pdf

Reading specific metadata fields:

exiftool -Author -CreateDate -ModifyDate filename.pdf

Reading GPS data from an image:

exiftool -GPSLatitude -GPSLongitude -GPSPosition filename.jpg

Reading metadata from all files in a directory:

exiftool /path/to/directory/

Metadata findings should be documented in the investigation record with the exact ExifTool output, the file the output was generated from, and the date the analysis was performed.


Stripping Metadata from Investigator-Generated Files

Every file the investigator creates or downloads during a research session carries metadata about the investigator’s device, software, and in some cases location. That metadata must be stripped before any file is stored in a location accessible to outside parties, shared with a client, or included in a report.

What to Strip and Why

Files downloaded from research platforms carry the investigator’s download timestamp and may carry IP address information embedded by the platform at download. Screenshots carry device and OS information. Documents created by the investigator carry the author name and organization from the device’s system settings.

The exposure risk is not hypothetical. A report delivered to a client that contains screenshots with embedded GPS coordinates from the investigator’s mobile device reveals the investigator’s location at the time of collection. A PDF report with the investigator’s real name embedded in the author metadata reveals their identity to anyone who examines the file properties — including a subject who receives the report indirectly.

ExifTool for Metadata Removal

Removing all metadata from a single file:

exiftool -all= filename.pdf

Removing all metadata from all files in a directory:

exiftool -all= /path/to/directory/

Removing metadata and saving to a new file rather than overwriting:

exiftool -all= -o cleaned_filename.pdf filename.pdf

ExifTool creates a backup of the original file by default (appending _original to the filename) when overwriting. For investigation files where the original should be preserved, use the -o flag to write to a new file rather than overwriting.

MAT2

MAT2 (Metadata Anonymisation Toolkit) is an alternative to ExifTool that offers a graphical interface option for investigators who prefer not to work in the command line. It supports PDF, Office documents, images, and several other formats. For batch processing and scripting, ExifTool remains the more flexible option. For single-file cleaning on a Linux-based system, MAT2 is a practical alternative.


Chain of Custody for Digital Evidence

Chain of custody is the documented record that tracks evidence from the moment of collection through every subsequent transfer, storage location, and use. In physical evidence, chain of custody is maintained through physical seals, evidence bags, and sign-out logs. In digital evidence, the equivalent is a documented record that can answer four questions about any piece of evidence: who collected it, when it was collected, how it was collected, and whether it has been modified since collection.

What to Document at Collection

For every piece of digital evidence collected, the investigation record should capture:

Source — The full URL of the page or the platform from which the file was obtained. For archived pages, both the live source URL and the archive URL.

Collection date and time — The exact timestamp at the moment of collection, including timezone. For archived pages, the timestamp shown by archive.today or the Wayback Machine is the authoritative record.

Collection method — Whether the evidence was archived, screenshot, downloaded, or captured through another method. The tool used (archive.today, ExifTool, screenshot utility) should be noted.

Collector — The identity of the investigator who collected the evidence. In solo investigations this is implicit; in multi-investigator cases it must be explicit.

Hash value — For downloaded files, a cryptographic hash (SHA-256) of the file at the moment of collection provides a verifiable fingerprint. If the file is later questioned as altered, the hash can confirm whether the file matches its original state. Generating a SHA-256 hash:

shasum -a 256 filename.pdf        # macOS / Linux
certutil -hashfile filename.pdf SHA256   # Windows

The hash value and the filename it was generated from are recorded in the investigation log at the time of collection.

Storage and Access Control

Investigation files should be stored in encrypted storage — a VeraCrypt volume, an encrypted disk image, or an encrypted folder on a device with full-disk encryption active. Unencrypted storage of investigation files on a standard desktop or in an unencrypted cloud service creates unnecessary exposure for both the investigator and the subjects of the investigation.

Access to investigation files should be limited to parties with a legitimate need. Files shared with clients, attorneys, or other parties should be transmitted through encrypted channels — not unencrypted email.

File naming should follow a consistent convention that does not include the subject’s real name in plaintext in the filename itself. Investigation files stored on a device or in a cloud service with the subject’s name in the filename create an unencrypted index of investigative activity that is readable without opening any of the files.


Reporting: Evidence in the Final Product

The investigation report is the final context in which evidence handling decisions become visible. A report that presents findings without documenting their source, collection method, and timestamp is a report that cannot be defended if its findings are challenged.

Each finding presented in a report should be supported by:

A citation to the archived or collected source — the archive URL, the platform, the document name — not just a description of what was found.

A collection date — when the evidence was collected, not when the report was written.

A note on the collection method — archived, downloaded, screenshot — sufficient to explain how the finding was obtained and how it can be independently verified.

For investigations that may produce litigation, the investigation report itself should be treated as a document that will be scrutinized. The report’s own metadata — author, creation date, modification history — should reflect its actual provenance. A report delivered with a creation date that predates the investigation it documents, or with an author name that does not match the investigator of record, creates an authenticity problem that is difficult to explain.


Where to Go Next

For the tool stack that supports these evidence handling procedures: OPSEC Tools for Investigators — ExifTool setup, archiving tools, and the complete investigative stack.

For operational security during the collection phase: Complete OPSEC Guide for Investigators — the layered framework covering network, browser, identity, and device controls.

For pre-collection verification: OPSEC Checklist for Investigators — confirms the environment is clean before evidence collection begins.

For the OSINT workflow that evidence handling supports: OSINT Workflow: The 8-Phase Investigation Framework — where collection, archiving, and documentation fit in the full investigation process.


Related Guides


Disclaimer: This article is for informational purposes only and does not constitute legal advice. Evidence handling standards vary by jurisdiction and by the nature of the proceeding in which findings may be used. Consult qualified legal counsel regarding evidentiary requirements before collecting or presenting digital evidence in any legal context.

2 thoughts on “Evidence Handling & Metadata for Investigators”

Comments are closed.