Smart Extract
AI-powered document data extraction
Smart Extract is Flux Capture's core AI engine that automatically reads and extracts data from your invoices and financial documents. This guide explains how it works and how to get the best results.
How Smart Extract Works
When you upload a document, Smart Extract performs several sophisticated analysis steps:
1. Document Recognition
The AI first identifies what type of document you've uploaded:
- Invoice
- Receipt
- Credit Memo
- Expense Report
- Purchase Order
2. Layout Analysis
Smart Extract analyzes the document structure to identify:
- Header zone (vendor info, invoice details)
- Line items table
- Totals section
- Footer information
3. Field Extraction
Using advanced OCR and natural language processing, the system extracts:
| Field | Description |
|---|---|
| Vendor Name | Company or individual billing you |
| Vendor Address | Vendor's mailing address |
| Invoice Number | Unique invoice identifier |
| Invoice Date | Date the invoice was issued |
| Due Date | Payment due date |
| PO Number | Purchase order reference |
| Payment Terms | Net 30, Net 60, etc. |
| Subtotal | Pre-tax amount |
| Tax Amount | Taxes charged |
| Total Amount | Final amount due |
| Currency | USD, CAD, EUR, etc. |
4. Line Item Extraction
For each line item, Smart Extract captures:
- Item code or SKU
- Description
- Quantity
- Unit of measure
- Unit price
- Line amount
- Tax (if itemized)
5. Vendor Matching
After extraction, the system matches the vendor name to your NetSuite vendor list using:
- Fuzzy string matching
- Tax ID correlation
- Email domain matching
- Historical alias matching
Extraction Providers
Smart Extract supports multiple AI providers, each with different strengths:
OCI Document Understanding
Oracle's built-in document AI:
- Native integration with NetSuite
- No additional API keys required
- Good general-purpose extraction
Azure Form Recognizer
Microsoft's document AI service:
- Excellent line item extraction
- High accuracy on complex layouts
- Strong multi-page support
Mindee
Specialized invoice processing:
- Purpose-built for invoices
- Fast processing times
- Great for standard invoice formats
✅ Tip: If you're unsure which provider to use, start with OCI. If you need better line item accuracy, try Azure.
Understanding Extraction Results
Field Confidence
Each extracted field has a confidence score:
| Score | Meaning |
|---|---|
| 90%+ | Very confident, likely correct |
| 70-89% | Reasonably confident |
| 50-69% | Uncertain, verify carefully |
| Below 50% | Low confidence, may need manual entry |
Extraction Warnings
Smart Extract may flag issues during extraction:
| Warning | Meaning |
|---|---|
| Low confidence field | One or more fields uncertain |
| Missing required field | Invoice number or amount not found |
| Date interpretation | Ambiguous date format detected |
| Amount mismatch | Line items don't sum to total |
| Multi-page limited | Only first N pages processed |
Getting Better Extraction Results
Document Quality
The quality of your source documents significantly affects extraction:
Good documents:
- Digital PDFs (not scanned images)
- High-resolution scans (300 DPI+)
- Straight, uncropped pages
- Clear, readable text
Problematic documents:
- Low-resolution photos
- Skewed or rotated pages
- Handwritten invoices
- Heavily stylized fonts
Tips for Best Results
- Use native PDFs - Digital PDFs extract better than scanned images
- Scan at 300 DPI - Higher resolution improves OCR accuracy
- Keep pages straight - Skewed documents reduce accuracy
- Include full pages - Don't crop out headers or footers
- Avoid photos - Take screenshots or scan instead
Multi-Page Documents
Smart Extract handles multi-page invoices:
Automatic Page Merging
Tables that span multiple pages are automatically combined into a single list of line items.
Page Limits
By default, all pages are processed. You can limit this in Settings to speed up processing for very long documents.
⚠️ Warning: Limiting pages may cause missed line items on later pages.
Currency Detection
Smart Extract identifies currency from multiple signals:
- Currency symbols - $, €, £, etc.
- Currency codes - USD, CAD, EUR
- Vendor location - Based on address
- Account settings - Your default currency
When currency is ambiguous (e.g., $ could be USD or CAD), the system uses:
- Vendor's default currency (if known)
- Your account's default currency
- The most likely currency based on context
Date Parsing
Dates are parsed intelligently based on context:
Format Detection
The system handles various formats:
- MM/DD/YYYY (US standard)
- DD/MM/YYYY (European/International)
- YYYY-MM-DD (ISO format)
- Written dates (January 15, 2024)
Ambiguous Dates
For ambiguous dates like "03/04/2024":
- Vendor history is checked for patterns
- Regional conventions are considered
- You can correct during review to train the system
Amount Parsing
Financial amounts are parsed considering:
Number Formats
- US/UK: 1,234.56 (comma thousands, period decimal)
- European: 1.234,56 (period thousands, comma decimal)
Negative Amounts
The system recognizes:
- Parentheses: (100.00)
- Minus sign: -100.00
- CR suffix: 100.00 CR
Custom Field Extraction
For fields specific to your business, Smart Extract can:
Auto-Match Custom Fields
When you have custom fields on your vendor bill form, the system attempts to match extracted data to those fields based on:
- Field labels
- Similar terminology
- Position on document
Manual Mapping
In the review interface, you can manually map extracted values to any field on your form.
Extraction Limits
File Size
Maximum file size: 20 MB
Processing Time
Typical processing times:
| Document Type | Time |
|---|---|
| 1-page invoice | 2-5 seconds |
| 5-page invoice | 5-10 seconds |
| 10+ page invoice | 10-30 seconds |
Concurrent Processing
Multiple documents can be processed simultaneously. Large batches are queued automatically.
Troubleshooting Extraction
No Data Extracted
- Verify the file isn't corrupted
- Check if the document is readable by humans
- Try a different file format (PDF instead of image)
- Ensure the document isn't password-protected
Wrong Field Values
- Check document quality
- Review the original document
- Correct the value (this trains the system)
- Consider switching extraction providers
Missing Line Items
- Verify all pages were processed
- Check page limit settings
- Ensure table isn't split across pages
- Try increasing max extraction pages
Next Steps
- Learn about the Learning Engine to improve accuracy over time
- Set up Fraud Shield to catch extraction issues
- Configure extraction providers for your needs