Spaces:

StabRise
/

README

Running

App Files Files Community

README / README.md

MykolaMelnyk

Update README.md

55da6dd verified over 1 year ago

preview code

raw

history blame contribute delete

2.52 kB

	---
	title: README
	emoji: 💻
	colorFrom: indigo
	colorTo: indigo
	sdk: static
	pinned: false
	---

	# Hi there 👋

	StabRise - Document Processing Solutions

	# Our projects

	## PDF DataSource for the Apache Spark

	<a href="https://stabrise.com/spark-pdf/"><img alt="Spark Pdf" src="https://stabrise.com/media/filer_public_thumbnails/filer_public/16/d6/16d6a0d6-f162-42ad-a5a3-7dc20361ad24/sparkpdf.png__1000x300_subsampling-2.webp" height="120"></a>

	---

	Source Code: [https://github.com/StabRise/spark-pdf](https://github.com/StabRise/spark-pdf)

	Home page: [https://stabrise.com/spark-pdf/](https://stabrise.com/spark-pdf/)

	Quick Start Jupyter Notebook: [https://github.com/StabRise/spark-pdf/blob/main/examples/PdfDataSource.ipynb](https://github.com/StabRise/spark-pdf/blob/main/examples/PdfDataSource.ipynb)

	---

	The project provides a custom data source for the Apache Spark that allows you to read PDF files into the Spark DataFrame.

	## Key features:

	- Read PDF documents to the Spark DataFrame
	- Support read PDF files lazy per page
	- Support big files, up to 10k pages
	- Support scanned PDF files (call OCR)
	- No need to install Tesseract OCR, it's included in the package

	## ScaleDP

	<a href="https://stabrise.com/scaledp/"><img alt="ScaleDP" src="https://stabrise.com/media/filer_public_thumbnails/filer_public/4a/7d/4a7d97c2-50d7-4b7a-9902-af2df9b574da/scaledplogo.png__1000x300_subsampling-2.webp" height="120" /></a>

	---

	Source Code: [https://github.com/StabRise/scaledp](https://github.com/StabRise/scaledp)

	Home page: [https://stabrise.com/scaledp/](https://stabrise.com/scaledp/)

	Quick Start Jupyter Notebook: [https://github.com/StabRise/ScaleDP-Tutorials/blob/master/1.QuickStart.ipynb](https://github.com/StabRise/ScaleDP-Tutorials/blob/master/1.QuickStart.ipynb)

	---

	ScaleDP is an Open-Source Library for processing documents using Apache Spark.

	### Key features:

	- Load PDF documents/Images
	- Extract text from PDF documents/Images
	- Extract images from PDF documents
	- OCR Images/PDF documents
	- Run NER on text extracted from PDF documents/Images
	- Visualize NER results


	## De-Identify

	<a href="https://deidentify.online"><img alt="De-Identify" src="https://stabrise.com/media/filer_public_thumbnails/filer_public/fb/fe/fbfe4b0c-dadb-4878-88ad-1c0ece0dc053/deidentifylogo.png__1000x300_subsampling-2.webp" height="120" /></a>

	De-Identify is tool for de-identification/anonymization data

	### Supported formats
	- text
	- images
	- pdf documents
	- DICOM files