Back to Projects
scaffoldedJanuary 1, 1970

SEC EDGAR Full-Text Filings Mining

secedgarnlprisk-factorspublic-datatext-analysis

Every public company files 10-Ks, 10-Qs, 8-Ks, and proxy statements. The text itself — what gets added, what gets removed, what language spreads between companies — tells stories that headline numbers don't.

This project tracks seed phrases like 'generative AI', 'tariff escalation', and 'GLP-1 substitution' from their first appearance in SEC filings through their spread across the S&P 500 over 8 quarters. It also diffs successive 10-Ks to find risks that quietly disappeared, and scores companies on disclosure originality.

Data source: SEC EDGAR full-text search (free, 10 req/sec). Analysis covers 2018-2024.