Problem definition and goal
Part of building network infrastructure for telecommunication operator is creating a lot of different documents like lease contracts, CAD drawings, measurements, installation materials and so on. When the site is already operational, there are other activities that need to be performed and which generate other documents like revision reports, site visits and so on.
All these documents have been originally stored on a single shared drive. As the amount of document grew it became harder and harder to find the right ones. To work around this problem, users started to create their own folder hierarchies, duplicating some of the documents which made the management of the documents even harder. The shared drive had also other problems like lack of full text search, no traceability, pure user rights management and so on.
This project was characterized by large amount of unstructured data, which needed to be categorized, so users can find what they need no matter what role they play. The documents have been created and used by internal Orange employees as well as users from external companies, which are building the network.
- Document categorization – It was obvious from analyzing the documents stored on the shared drive that users have different “views” on the document categorization. For some user group the view on the document was “Site-centric” where for others it was more “Time-centric”. This results in different folder hierarchies and document fragmentation.
- Structured and unstructured data – Although most data have been stored in documents or the documents had non trivial amount of data stored in the folder structures themself. Document type, site number, year, which the document belonged to and much more, were encoded in folder names that prevented proper use of this information.
- Duplication and quality assurance – It often happened that users skipped some mandatory folders resulting in wrong document categorization or uploaded a document, which was already uploaded by their colleagues.
- Search – primary concern of the shared drive document storage was poor usability when searching for documents. It was not possible to use full-text search, it was hard to navigate via folder hierarchies and resolve duplicates.
- Ensure security and trackability – If user moved documents to some other location or accidentally deleted documents it was not possible to trace, who handled such action. Although possible, it was not easy to protect some documents from unauthorized access.
- Data import – There was already a large set of documents stored in the shared drive, which had to be loaded to DMS, without losing information about its categorization.
- Data to information – User have no easy option to get answer for a simple question: How many sites exists, which do not have electric revision for the year 2020.
In OBJECTIFY we created a quick web-based prototype of DMS system which was not organized around folders, but instead allowed to define custom field on documents that could be used for searching without creating folder hierarchies. The idea has been adopted and we started to implement solutions for all identified problems
- Document categorization and structured data – Using folders to categories documents has some major drawbacks. Users must agree on a single hierarchy of folders and assign documents, respectively. Additionally, folders have no semantics. It is just folders. You do not know that the name of the folder is for example the name of a project. That is why we decided early on not to used folders at all. In DMS all structured information (Project, Creation date, Document type, anything you like) is stored in specific document attribute which allows for advanced filtering, any hierarchy that is needed, advance reporting on documents, validations, rules based on attribute values and more
- Prevent duplication and wrong categorization – Users can define document types and their attributes. They can further customize, which field are mandatory, which will be calculated and much more. To prevent users from duplicating content we have implemented search for similar documents and users gets warning if some similarity threshold is met. For the most common cases, users can define “Upload trees”, which allow to easily upload different types of documents for given scenario. Further users can configure rules to conclude structured information from folder names being uploaded.
- Search – We use Apache Solr to index documents and search using full text search with word occurrence highlighting, type-ahead features or spellcheck corrections. Further we use our structured data repository to supply faced search experience. The faceted search is hierarchy independent so “Site-centric” users can start by filtering via Site and “Time-centric” can use date range.
- Ensure security and trackability – We have implemented security mechanism to DMS that can by finetuned to the level of single documents. Users can specify rights individually or use configuration rules and base security on document field values. Further we keep detail log of activity on each document, so no user action gets lost.
- Data import – To be able to migrate data to DMS, we have implemented stand-alone migration tool with configurable rule engine, which not only migrates documents but can serve as synchronization part of some data source (shared drive, share point, database, ..) to DMS. This way DMS can be used not only as a document storage but also as search engine over other application.
- Data to information – Having all structured data stored in dedicated field allowed as to create user defined reporting engine, where users can define tabular reports and follow progress of installations, document statistics any much more.
- Project duration: 6 years
- Production installations: Orange Slovakia, GNOC Africa
- Number of documents: close to 1Mio
- Users: 250+
- Domain: Telecommunications
- Competence: High scalability, High Availability, Data Integration, Automated Testing
- Technologies used: Java, MySQL, Spring stack, REST, VUE, VUEtify, Apache Solr, Docker, Kubernetes, Openshift