Goal : How to extract Structured information (JSON) from Complex PDF document with Magic xpi
1/ Sign in on your adobe developer console (Adobe Developer Website)
Create a new project
data:image/s3,"s3://crabby-images/2b089/2b08912a860471e48048e6d848f4f0427cb6edeb" alt=""
2/ Create an API Key
data:image/s3,"s3://crabby-images/80b09/80b09cac0af8d614ba5da94895b409ca993c8978" alt=""
Copy your client id, client secret and Check the scope (openid, AdobeID, DCAPI)
data:image/s3,"s3://crabby-images/98fde/98fde903f845fa75449852265cb5a4adf49b73ba" alt=""
3/ From now, you can use Postman to check the Adobe service API
Assume that you want to retrieve the Dimensions of your product from the PDF document below
data:image/s3,"s3://crabby-images/8f88d/8f88dd9a450448abb8aefb8e4ecfe3cc08bf0baf" alt=""
The sequence to retrieve the information is :
data:image/s3,"s3://crabby-images/a85c2/a85c2b7bb97b36c313881e1724fb7d98113f21b3" alt=""
After getting the token, do a POST for an asset (https://pdf-services.adobe.io/assets) to retrieve an uploadUri and an assetid
data:image/s3,"s3://crabby-images/9d61d/9d61db0bb32a266d8182e5c32e3e4c17bf6a7a04" alt=""
Do a PUT on the uploadUri with your PDF file
data:image/s3,"s3://crabby-images/e8b50/e8b505d098f0b6b770ae40a3e9840ffd90fb7f36" alt=""
Do a POST on https://pdf-services-ue1.adobe.io/operation/extractpdf and pass the assetid
data:image/s3,"s3://crabby-images/5966d/5966d17bb3527307f9f92a9a0adef873415dfded" alt=""
Retrieve in the response header the key : location
data:image/s3,"s3://crabby-images/faea2/faea2ad4cc19dec5a452b8974e624edf8b60e61b" alt=""
Do a GET the URL location to retrieve the download URI to get the json content
data:image/s3,"s3://crabby-images/a95ad/a95ad806afa4561675c726552d209a0712595697" alt=""
Do a GET on the downloadUri
data:image/s3,"s3://crabby-images/9bc50/9bc50b801ff4dce4aa1bb65e262a5d55a99859ea" alt=""
4/ Define Resources in the Magic xpi resources repository
0ne REST Client resource with 3 paths (token, assets, extractpdf) and one HTTP resource to do the PUT.
data:image/s3,"s3://crabby-images/76e4e/76e4eba95b453ef3d037792c9a30a9405a960bd6" alt=""
data:image/s3,"s3://crabby-images/80a43/80a430d42482f98183397f4613e49f57c6df12ea" alt=""
data:image/s3,"s3://crabby-images/9713d/9713dfc3bf812d21a86a09395dbf7c2ba5c6acd6" alt=""
data:image/s3,"s3://crabby-images/b6fee/b6feeb0c7a941b2b323222fad13e829893f98d85" alt=""
5/ Structure of the flow is like below :
data:image/s3,"s3://crabby-images/fb55e/fb55e8295d717583dcac0282ee78c869f31cc34f" alt=""
use the Set body token to step to update the http body
data:image/s3,"s3://crabby-images/a14b3/a14b33478b3bad95833ab5efe95deb20bbbec20a" alt=""
Drag and drop Rest client connector and call the token url by passing the body to DataBlob.
data:image/s3,"s3://crabby-images/83813/8381317224df73050ef3fd2f73d32ed7dbc3b0a9" alt=""
next, use Flow data connector to update 2 flow variables (F.accesstoken, F.BodyAsset)
data:image/s3,"s3://crabby-images/da654/da6542357f5ccab91b42bccd7f23b3b4b572a0be" alt=""
6/ Drag and drop Rest client to call assets
data:image/s3,"s3://crabby-images/7eec3/7eec38fd33b3a108d03a67edfa04470d06236425" alt=""
Click on Parameters to pass the API-Key and the Bearer token
In the mapping, pass the F.BodyAsset in the datablob
use the « Asset Response Parsing » step to get the « uploadUri » and the assetID
data:image/s3,"s3://crabby-images/3f001/3f0018c467cc3fe02aba6b6e930d883263b755af" alt=""
7/ To do the PUT, i’ll use powershell script with Magic xpi template
Define a template with 2 tags like below :
Invoke-WebRequest -Uri ‘<!$MG_Url>‘ -Method ‘Put’ -ContentType ‘application/pdf’ -InFile ‘<!$MG_PDFFile>‘ > ‘c:/tmp/trace.txt’
use Datamapper to merge values on the 2 tags
data:image/s3,"s3://crabby-images/6eb1f/6eb1fa503e2c018b0ee5cad4976ebdfe7ffad90e" alt=""
after this step, you must obtain a powershell script (uploadPDFAdobe.ps1)
Then execute the powershell script with a file management and run command line
data:image/s3,"s3://crabby-images/dcf50/dcf50a2151cf9531901b8ab0b09073f031cd4f42" alt=""
8/ After executing this step, your PDF file is uploaded to Adobe Service cloud platform
9/ Extractpdf method
Drag and drop Flow data component and update a flow variable to update the body (F.BodyExtract)
Populate the AssetID (step 6)
data:image/s3,"s3://crabby-images/9367a/9367af9b25949502b69611db3d157f56103e0aef" alt=""
use REST Client connector to call the the extractpdf method
data:image/s3,"s3://crabby-images/49393/4939380c50ff04d863b943c005ad79c9756510a9" alt=""
9/ Set a delay of 5 secs.
10/ Retrieve the status of the result
This url is retrieved from the response header
data:image/s3,"s3://crabby-images/4c3aa/4c3aa80c0ab3d850086e9baf0635deaeb3011fcd" alt=""
Drag and drop HTTP connector to call the status method
data:image/s3,"s3://crabby-images/59390/593906e2280e4655c02fcde33266af428914a9d9" alt=""
Next, use the datamapper to parse the json response
data:image/s3,"s3://crabby-images/c903b/c903b0ed969fe9a14229f52097585732547fc00b" alt=""
11/ use the HTTP connector to download the JSON result
Update the envinoment variable witj the DonwloadURI
data:image/s3,"s3://crabby-images/b5513/b5513d60e59255b935b6e1f2a0435adb3393a436" alt=""
data:image/s3,"s3://crabby-images/5b576/5b57650a88978a93e3d515fd1b6cab35020073ff" alt=""
use the datamapper to extract information from the PDF file using JSON schema
data:image/s3,"s3://crabby-images/75c39/75c39be9ffbe1509de565277f1290a8bcd00074f" alt=""
use a Condition on the destination node to retrieve the « Product Width »
data:image/s3,"s3://crabby-images/39062/39062f7bf8e3e4bcfa28ba64d45c4cdf88255854" alt=""
data:image/s3,"s3://crabby-images/3fb0d/3fb0d2bcee0bb6d0cc02a9bcec2978f4a54cdb76" alt=""