6 min read

Smart Extract

AI-powered document data extraction

Smart Extract is Flux Capture's core AI engine that automatically reads and extracts data from your invoices and financial documents. This guide explains how it works and how to get the best results.

How Smart Extract Works

When you upload a document, Smart Extract performs several sophisticated analysis steps:

1. Document Recognition

The AI first identifies what type of document you've uploaded:

Invoice
Receipt
Credit Memo
Expense Report
Purchase Order

2. Layout Analysis

Smart Extract analyzes the document structure to identify:

Header zone (vendor info, invoice details)
Line items table
Totals section
Footer information

3. Field Extraction

Using advanced OCR and natural language processing, the system extracts:

Field	Description
Vendor Name	Company or individual billing you
Vendor Address	Vendor's mailing address
Invoice Number	Unique invoice identifier
Invoice Date	Date the invoice was issued
Due Date	Payment due date
PO Number	Purchase order reference
Payment Terms	Net 30, Net 60, etc.
Subtotal	Pre-tax amount
Tax Amount	Taxes charged
Total Amount	Final amount due
Currency	USD, CAD, EUR, etc.

4. Line Item Extraction

For each line item, Smart Extract captures:

Item code or SKU
Description
Quantity
Unit of measure
Unit price
Line amount
Tax (if itemized)

5. Vendor Matching

After extraction, the system matches the vendor name to your NetSuite vendor list using:

Fuzzy string matching
Tax ID correlation
Email domain matching
Historical alias matching

Extraction Providers

Smart Extract supports multiple AI providers, each with different strengths:

OCI Document Understanding

Oracle's built-in document AI:

Native integration with NetSuite
No additional API keys required
Good general-purpose extraction

Azure Form Recognizer

Microsoft's document AI service:

Excellent line item extraction
High accuracy on complex layouts
Strong multi-page support

Mindee

Specialized invoice processing:

Purpose-built for invoices
Fast processing times
Great for standard invoice formats

✅ Tip: If you're unsure which provider to use, start with OCI. If you need better line item accuracy, try Azure.

Understanding Extraction Results

Field Confidence

Each extracted field has a confidence score:

Score	Meaning
90%+	Very confident, likely correct
70-89%	Reasonably confident
50-69%	Uncertain, verify carefully
Below 50%	Low confidence, may need manual entry

Extraction Warnings

Smart Extract may flag issues during extraction:

Warning	Meaning
Low confidence field	One or more fields uncertain
Missing required field	Invoice number or amount not found
Date interpretation	Ambiguous date format detected
Amount mismatch	Line items don't sum to total
Multi-page limited	Only first N pages processed

Getting Better Extraction Results

Document Quality

The quality of your source documents significantly affects extraction:

Good documents:

Digital PDFs (not scanned images)
High-resolution scans (300 DPI+)
Straight, uncropped pages
Clear, readable text

Problematic documents:

Low-resolution photos
Skewed or rotated pages
Handwritten invoices
Heavily stylized fonts

Tips for Best Results

Use native PDFs - Digital PDFs extract better than scanned images
Scan at 300 DPI - Higher resolution improves OCR accuracy
Keep pages straight - Skewed documents reduce accuracy
Include full pages - Don't crop out headers or footers
Avoid photos - Take screenshots or scan instead

Multi-Page Documents

Smart Extract handles multi-page invoices:

Automatic Page Merging

Tables that span multiple pages are automatically combined into a single list of line items.

Page Limits

By default, all pages are processed. You can limit this in Settings to speed up processing for very long documents.

⚠️ Warning: Limiting pages may cause missed line items on later pages.

Currency Detection

Smart Extract identifies currency from multiple signals:

Currency symbols - $, €, £, etc.
Currency codes - USD, CAD, EUR
Vendor location - Based on address
Account settings - Your default currency

When currency is ambiguous (e.g., $ could be USD or CAD), the system uses:

Vendor's default currency (if known)
Your account's default currency
The most likely currency based on context

Date Parsing

Dates are parsed intelligently based on context:

Format Detection

The system handles various formats:

MM/DD/YYYY (US standard)
DD/MM/YYYY (European/International)
YYYY-MM-DD (ISO format)
Written dates (January 15, 2024)

Ambiguous Dates

For ambiguous dates like "03/04/2024":

Vendor history is checked for patterns
Regional conventions are considered
You can correct during review to train the system

Amount Parsing

Financial amounts are parsed considering:

Number Formats

US/UK: 1,234.56 (comma thousands, period decimal)
European: 1.234,56 (period thousands, comma decimal)

Negative Amounts

The system recognizes:

Parentheses: (100.00)
Minus sign: -100.00
CR suffix: 100.00 CR

Custom Field Extraction

For fields specific to your business, Smart Extract can:

Auto-Match Custom Fields

When you have custom fields on your vendor bill form, the system attempts to match extracted data to those fields based on:

Field labels
Similar terminology
Position on document

Manual Mapping

In the review interface, you can manually map extracted values to any field on your form.

Extraction Limits

File Size

Maximum file size: 20 MB

Processing Time

Typical processing times:

Document Type	Time
1-page invoice	2-5 seconds
5-page invoice	5-10 seconds
10+ page invoice	10-30 seconds

Concurrent Processing

Multiple documents can be processed simultaneously. Large batches are queued automatically.

Troubleshooting Extraction

No Data Extracted

Verify the file isn't corrupted
Check if the document is readable by humans
Try a different file format (PDF instead of image)
Ensure the document isn't password-protected

Wrong Field Values

Check document quality
Review the original document
Correct the value (this trains the system)
Consider switching extraction providers

Missing Line Items

Verify all pages were processed
Check page limit settings
Ensure table isn't split across pages
Try increasing max extraction pages

Next Steps

Learn about the Learning Engine to improve accuracy over time
Set up Fraud Shield to catch extraction issues
Configure extraction providers for your needs