Structured extraction is the safety boundary between messy email and an agent action. Instead of passing raw email to a tool, extract intent, entities, risk, confidence, and requested action into a narrow schema first.
last updated 2026-05-074 sections
section 01
Extraction pipeline
The pipeline should preserve the raw message, normalize text and HTML, isolate the latest reply, extract structured fields, validate the schema, then route the result to an agent or human review queue.
step
output
guardrail
Capture
Raw MIME or provider payload.
Store message ID and retention policy.
Normalize
Clean text, HTML, headers, and attachments.
Separate quoted history from latest reply.
Extract
Intent, entities, confidence, requested action.
Use a schema with required fields.
Validate
Accepted or rejected extraction object.
Reject missing identifiers or unsafe action types.
Route
Agent task or review item.
Human review for low confidence or high risk.
section 02
Minimum schema
The minimum useful schema includes message ID, sender, recipient mailbox, normalized intent, entities, confidence, risks, requested action, and review requirement. That gives the workflow enough context to decide without reading the full email.
field
purpose
example
message_id
Dedupe and audit.
provider message ID
intent
Classify the sender request.
refund_request
entities
Capture important objects.
order_id, invoice_id, date
confidence
Decide automation versus review.
high, medium, low
risks
Expose policy concerns.
new recipient, attachment, money movement
requested_action
Map to a tool.
send_reply, create_ticket, update_crm
section 03
Validation rules
Extraction is not complete until the object passes validation. Unknown intents, missing account identity, unsupported attachments, or low confidence should route to review instead of being patched by the model.
okSet review_required when confidence is low or medium on a risky action.
okTreat attachments as references until scanned or inspected.
okKeep the original payload attached to the audit record.
section 04
Provider fit
ParseForce is the most schema-oriented inbound option in the current provider set. Inbound and CloudMailin fit typed webhook routing. Mailgun, Postmark, and SendGrid can parse inbound mail, but the extraction layer usually belongs in the application.