Address

Hitech House
Near Gurukul Tower
Gurukul, Ahmedabad - 380052
Gujarat, India

Phone: +91-794-体育OB

Fax: +91-794-000-3202

Email: sales@hitechdigital.com

BPM

ob体育网页版

Web Scraping of unstructured healthcare data from multiple web threads using NLP and RPA Banner

Client Profile

Founded in 2007, the USA based medical knowledge management company manages a huge repository of healthcare data and shares relevant insights to consumers, patients, healthcare service providers and caregivers. These healthcare insights help make evidence-based healthcare decisions; right from healthy living and prevention to diagnosis, treatment and home care.

Business Need .

In order to further its healthcare data management business, the company was creating a comprehensive repository of healthcare data to be gathered from threads of more than 70 Redditt subgroups. The subreddits covered a wide spectrum of conditions from generic blood pressure, diabetes, diet and weight loss to serious ones like depression, cancer, liver, alcoholism and many more. The various types of data that was to be extracted included:

  • Disease condition
  • Username of topic creator
  • Usernames of respondents on main comments
  • Usernames of respondents on sub-comments
  • Text of all the topics, discussions, comments, replies and sub-comments

The company approached Hitech for capture and standardization of Reddit posts and discussions and classification according to race, ethnicity, age, etc.

Challenges .

The team at Hitech studied the processes to understand the scope of work, technology to be used, and the workflow to be designed. Following project requirements increased the challenges of the process:

Solution .

Designed an ongoing process of web scraping and unstructured data extraction using NLP and RPA to collect data from healthcare posts and discussion on Reddit subgroups. The automated data management workflow ensured that collected data upon reaching the final lag, would trigger a macro to move the data through a predefined quality and profiling process.

Approach .

Manual data extraction requires complex workflows and significant hand-coding to extract, cleanse, and validate unstructured data. So, data professionals at Hitech started off by deriving a smarter, easier way to automate unstructured data extraction workflows.

Implementation:

Quality Check and Audit:

Basic Process

Dispatch:

Automated upload of comprehensive output text file after conversion, transformation, and validation

Business Impact .

Share your Challenges Email us!

Call us now!

+91-794-体育OB

Connect with us

Facebook Icon linkedin icon twitter icon
英雄联盟竞猜选手抽注 亿电竞(河北)结果查询 pc28加拿大开奖软件详情 电竞竞猜 YBG电竞下注数据平台(YBG电竞季节APP v6.6) 加拿大28开奖今日官方