Our client*, a leading consumer advocacy website, turned to us to optimize their data classification workflow.
Before our solution, adding of every new company required manual management by the team of moderators, as they first had to determine the category a new company belonged to.
This was a routine process and the company had already accumulated a large base of text samples, so the goal we set was to automate their data classification process.
*The company name is protected by NDA
Our Data Science team developed the following strategy:
- Web scraping from different Internet sources;
- Preprocessing of the gathered data;
- Selecting and optimization of the classification approach;
- Creating the model;
- Integration of the model into the operating system of the website.
As a result, we have developed an API that finds a link to a new company's website and on the base of its content, defines what category the company belongs to.
This development allowed us to automate and speed up the process of enlarging of the client’s database and at the same time optimize the workflow of their moderating team.