Structured data formats
From documents to data
iXBRL (Inline XBRL)
Required for UK annual reports since 2021UK and EU regulators now require annual reports in iXBRL format. This embeds machine-readable tags within human-readable documents, enabling automated extraction of key financial figures.
LEI (Legal Entity Identifier)
Widely adoptedGlobal standard for identifying legal entities in financial transactions. Every UK listed company has an LEI, enabling unambiguous entity matching across systems.
ISIN / SEDOL
Standard practiceSecurity identifiers that allow instruments to be tracked across platforms. Essential for linking announcements to the right securities.
AI and language models
Understanding unstructured content
Document parsing
Production-readyLarge language models can extract structured information from narrative text — identifying revenue figures, guidance changes, and key events from announcement prose.
Summarisation
Production-readyAI can generate concise summaries of lengthy announcements, making it easier to quickly understand what matters without reading every word.
Sentiment analysis
Production-readyModels can assess the tone of announcements and management commentary, flagging changes in sentiment that might not be obvious from headline numbers.
Question answering
EmergingNatural language interfaces allow investors to ask questions about companies and receive synthesised answers drawing from multiple sources.
APIs and data infrastructure
Programmatic access to market data
REST APIs
Available from multiple providersStandard web APIs allow applications to query company data, announcements, and analytics programmatically. This enables integration with spreadsheets, trading systems, and custom tools.
Webhooks
Growing adoptionReal-time notifications when new announcements are published, enabling immediate processing without polling.
GraphQL
EmergingFlexible query language allowing clients to request exactly the data they need, reducing bandwidth and enabling more efficient applications.
Model Context Protocol (MCP)
A new paradigm for AI tool integration
What is MCP?
The Model Context Protocol is an open standard for connecting AI assistants to external data sources and tools. Instead of AI models working with static training data, MCP allows them to query live APIs and perform actions in real-time.
For financial data, this means AI assistants can access current company information, recent announcements, and live market data — rather than relying on potentially outdated training data.
Why MCP matters for markets
- AI assistants can answer questions with current data, not stale training
- Natural language queries can translate to structured API calls
- Tools can be composed — combining company data with news with analytics
- Developers can build on standardised interfaces rather than proprietary APIs
Example capabilities
- "What did Company X announce last week?"
- "Summarise the latest results for my portfolio"
- "Find companies that raised guidance this month"
- "Compare revenue growth across this sector"
Where this is heading
Documents → Data
Announcements will increasingly be structured at source, with machine-readable formats becoming the norm rather than the exception.
Fragmented → Connected
APIs and standard protocols will enable information to flow between systems, reducing the need for manual aggregation.
Expensive → Accessible
AI-powered analysis and natural language interfaces will make institutional-grade insights available at consumer price points.
The precedent from other industries
Other industries have made this transition:
- Legal documents became searchable databases (LexisNexis, Westlaw)
- Medical records became interoperable systems (HL7 FHIR)
- Banking statements became real-time data (Open Banking APIs)
- E-commerce moved from catalogues to searchable, comparable listings
Public markets are following the same trajectory — from static documents to queryable data, from fragmented sources to connected systems.