From patchwork Sun Mar 19 08:07:53 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gilles Talis X-Patchwork-Id: 740655 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3vmBXX66P1z9s78 for ; Sun, 19 Mar 2017 19:08:12 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="hBfPKZTI"; dkim-atps=neutral Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id C199F86FA5; Sun, 19 Mar 2017 08:08:08 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 39I5wVjQRsxp; Sun, 19 Mar 2017 08:08:06 +0000 (UTC) Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) by fraxinus.osuosl.org (Postfix) with ESMTP id 7B88186F71; Sun, 19 Mar 2017 08:08:05 +0000 (UTC) X-Original-To: buildroot@lists.busybox.net Delivered-To: buildroot@osuosl.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by ash.osuosl.org (Postfix) with ESMTP id 579A11C0169 for ; Sun, 19 Mar 2017 08:08:02 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 5452A88739 for ; Sun, 19 Mar 2017 08:08:02 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sQRypUBcWNO6 for ; Sun, 19 Mar 2017 08:08:01 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail-wr0-f179.google.com (mail-wr0-f179.google.com [209.85.128.179]) by hemlock.osuosl.org (Postfix) with ESMTPS id 318DD886EE for ; Sun, 19 Mar 2017 08:08:01 +0000 (UTC) Received: by mail-wr0-f179.google.com with SMTP id u48so74657368wrc.0 for ; Sun, 19 Mar 2017 01:08:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=fpOV8m6SmqpY/PczZwQlhtfbqlyUxqdWpomswxLb5FU=; b=hBfPKZTIKmyHcxHPK8+2zP/v0fCVMS3OdZlexa2lAyiCtHxSYWz3q3PrdSX2fKW5ZZ b66bb6wJyzW9qv9cT+a9xod4INd9Sgm7qKGkNiKXrQoiP5XJ2z4ww0VBlrBU2k3XAXcf PDx686ZD9rS2AxMyIxQMS4mV5YFpzx4p1rn6NHh5opNJ06MtITNKnPplTCsrrecVooZr lTHCh7L+Et8jiJjBPDsJiw5BHlAonsgD4SRkqXSVnMdgWIWafQXnlulMdlz0VKAjPrfv sJX1BhTezV3lIT53Zdh3z8uuoINxNzgAHbcRLmFGtydlyzbZ8eOTWP1gMuibuPcBGoxD av4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=fpOV8m6SmqpY/PczZwQlhtfbqlyUxqdWpomswxLb5FU=; b=R1vFN/dVXL4k0w/LzeKR60pd5nDuZmafRNOg1UGxhphdsYxO2QN0G86a8an4o2k8IJ oMp6KV9XF9rn2EMYdYdkrJMyFhIOy19niqFc8EUlA93bXSmDDELfZ+/KJS1w4o8ZbEMz Qt17NzcouhDmfa+5C3TvKKc7eLVSZKMalJj4DebvCS7hay4z/8WMpk73lHFxb+tDCEu1 rqK0IpikcV/oyZ2B7t99ZSk3smKrmcYJG0YyegQXk2ZjDLsEWj28hz623LGZRDaoKnif s2yCG2C+gZw+PH2eCsgm6GGuumg1FmwA0kRP3scwVaP9pIRDULj6j/BVtsryDb5YClTO PaBA== X-Gm-Message-State: AFeK/H0iimqDV0xmaMijTFlmEill4XUz8DT4uHaWKZj331RY+ddAY2/CM0zN3dLNWR+1pg== X-Received: by 10.223.169.171 with SMTP id b40mr21440029wrd.132.1489910879512; Sun, 19 Mar 2017 01:07:59 -0700 (PDT) Received: from localhost.localdomain (vll06-4-78-208-168-137.fbx.proxad.net. [78.208.168.137]) by smtp.gmail.com with ESMTPSA id k139sm8918142wmg.11.2017.03.19.01.07.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 19 Mar 2017 01:07:59 -0700 (PDT) From: Gilles Talis To: buildroot@buildroot.org Date: Sun, 19 Mar 2017 09:07:53 +0100 Message-Id: <1489910873-8450-3-git-send-email-gilles.talis@gmail.com> X-Mailer: git-send-email 2.5.0 In-Reply-To: <1489910873-8450-1-git-send-email-gilles.talis@gmail.com> References: <1489910873-8450-1-git-send-email-gilles.talis@gmail.com> Subject: [Buildroot] [PATCH v2 2/2] tesseract-ocr: new package X-BeenThere: buildroot@busybox.net X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion and development of buildroot List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: buildroot-bounces@busybox.net Sender: "buildroot" Signed-off-by: Gilles Talis --- Changes v2 (following review by Thomas P.) - Added language data files support inside main package instead of specific package for each of them - Explicitly selected PNG, JPEG and TIFF libraries as dependencies - Added DEVELOPERS file change - Fixed indentation issues - Added extra comments - Added limitations found using test-pkg script --- DEVELOPERS | 1 + package/Config.in | 1 + package/tesseract-ocr/Config.in | 44 ++++++++++++++++++++ package/tesseract-ocr/tesseract-ocr.hash | 8 ++++ package/tesseract-ocr/tesseract-ocr.mk | 69 ++++++++++++++++++++++++++++++++ 5 files changed, 123 insertions(+) create mode 100644 package/tesseract-ocr/Config.in create mode 100644 package/tesseract-ocr/tesseract-ocr.hash create mode 100644 package/tesseract-ocr/tesseract-ocr.mk diff --git a/DEVELOPERS b/DEVELOPERS index 8802fc7..bdc93d9 100644 --- a/DEVELOPERS +++ b/DEVELOPERS @@ -589,6 +589,7 @@ F: package/httping/ F: package/iozone/ F: package/leptonica/ F: package/ocrad/ +F: package/tesseract-ocr/ F: package/webp/ N: Gregory Dymarek diff --git a/package/Config.in b/package/Config.in index ed48058..66c87d5 100644 --- a/package/Config.in +++ b/package/Config.in @@ -244,6 +244,7 @@ comment "Graphic applications" source "package/mesa3d-demos/Config.in" source "package/qt5cinex/Config.in" source "package/rrdtool/Config.in" + source "package/tesseract-ocr/Config.in" comment "Graphic libraries" source "package/cegui06/Config.in" diff --git a/package/tesseract-ocr/Config.in b/package/tesseract-ocr/Config.in new file mode 100644 index 0000000..4fd0668 --- /dev/null +++ b/package/tesseract-ocr/Config.in @@ -0,0 +1,44 @@ +comment "tesseract-ocr needs a toolchain w/ threads, C++, gcc >= 4.8 & dynamic library" + depends on BR2_USE_MMU + depends on !BR2_INSTALL_LIBSTDCPP || !BR2_TOOLCHAIN_HAS_THREADS || \ + !BR2_TOOLCHAIN_GCC_AT_LEAST_4_8 || BR2_STATIC_LIBS + +menuconfig BR2_PACKAGE_TESSERACT_OCR + bool "tesseract-ocr" + depends on BR2_INSTALL_LIBSTDCPP + depends on BR2_TOOLCHAIN_HAS_THREADS + depends on BR2_TOOLCHAIN_GCC_AT_LEAST_4_8 # C++11 + depends on BR2_USE_MMU # fork() + depends on !BR2_STATIC_LIBS + select BR2_PACKAGE_JPEG + select BR2_PACKAGE_LEPTONICA + select BR2_PACKAGE_LIBPNG + select BR2_PACKAGE_TIFF + help + Tesseract is an OCR (Optical Character Recognition) engine, + It can be used directly, or (for programmers) using an API. + It supports a wide variety of languages. + + https://github.com/tesseract-ocr/tesseract + +if BR2_PACKAGE_TESSERACT_OCR +comment "tesseract-ocr languages support" + +config BR2_PACKAGE_TESSERACT_OCR_LANG_ENG + bool "English" + +config BR2_PACKAGE_TESSERACT_OCR_LANG_FRA + bool "French" + +config BR2_PACKAGE_TESSERACT_OCR_LANG_GER + bool "German" + +config BR2_PACKAGE_TESSERACT_OCR_LANG_SPA + bool "Spanish" + +config BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_SIM + bool "Simplified Chinese" + +config BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_TRA + bool "Traditional Chinese" +endif diff --git a/package/tesseract-ocr/tesseract-ocr.hash b/package/tesseract-ocr/tesseract-ocr.hash new file mode 100644 index 0000000..9bb5b52 --- /dev/null +++ b/package/tesseract-ocr/tesseract-ocr.hash @@ -0,0 +1,8 @@ +# locally computed +sha256 3fe83e06d0f73b39f6e92ed9fc7ccba3ef734877b76aa5ddaaa778fac095d996 tesseract-ocr-3.05.00.tar.gz +sha256 c0515c9f1e0c79e1069fcc05c2b2f6a6841fb5e1082d695db160333c1154f06d eng.traineddata +sha256 86afb23ad146467f263e8ade56fd3951b1cc28f8c4eebc34f993d3c02d88a7ab fra.traineddata +sha256 cb7eb42a7e972cec7ef904fe81825d7b547c46df684c814fdb11a930b13bca3a deu.traineddata +sha256 f23985996bbcfe2b57864ccb082783c1c74c87429f04411a04a6ba4d3da2efda spa.traineddata +sha256 323ae74d4a2ff49e932dbb4d6282fe0e67ddfafda075ec85803ecd077207454c chi_sim.traineddata +sha256 774d566bd0b36e4b6c07415dfa5b6b57feb2575b1f5f231d7fe01a52dac5dd0e chi_tra.traineddata diff --git a/package/tesseract-ocr/tesseract-ocr.mk b/package/tesseract-ocr/tesseract-ocr.mk new file mode 100644 index 0000000..5ddacda --- /dev/null +++ b/package/tesseract-ocr/tesseract-ocr.mk @@ -0,0 +1,69 @@ +################################################################################ +# +# tesseract-ocr +# +################################################################################ + +TESSERACT_OCR_VERSION = 3.05.00 +TESSERACT_OCR_DATA_VERSION = 3.04.00 +TESSERACT_OCR_SITE = $(call github,tesseract-ocr,tesseract,$(TESSERACT_OCR_VERSION)) +TESSERACT_OCR_LICENSE = Apache-2.0 +TESSERACT_OCR_LICENSE_FILES = COPYING + +# Source from github, no configure script provided +TESSERACT_OCR_AUTORECONF = YES + +TESSERACT_OCR_DEPENDENCIES += leptonica jpeg libpng tiff + +TESSERACT_OCR_INSTALL_STAGING = YES + +TESSERACT_OCR_CONF_ENV += \ + LIBLEPT_HEADERSDIR=$(STAGING_DIR)/usr/include/leptonica + +# Language data files download +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_ENG),y) +TESSERACT_OCR_DATA_FILES += eng.traineddata +endif + +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_FRA),y) +TESSERACT_OCR_DATA_FILES += fra.traineddata +endif + +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_DEU),y) +TESSERACT_OCR_DATA_FILES += deu.traineddata +endif + +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_SPA),y) +TESSERACT_OCR_DATA_FILES += spa.traineddata +endif + +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_SIM),y) +TESSERACT_OCR_DATA_FILES += chi_sim.traineddata +endif + +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_TRA),y) +TESSERACT_OCR_DATA_FILES += chi_tra.traineddata +endif + +TESSERACT_OCR_EXTRA_DOWNLOADS = \ + $(addprefix https://github.com/tesseract-ocr/tessdata/raw/$(TESSERACT_OCR_DATA_VERSION)/,\ + $(TESSERACT_OCR_DATA_FILES)) + +define TESSERACT_OCR_PRECONFIGURE + # Autoreconf step fails due to missing m4 directory + mkdir -p $(@D)/m4 +endef + +TESSERACT_OCR_PRE_CONFIGURE_HOOKS += TESSERACT_OCR_PRECONFIGURE + +# Language data files installation +define TESSERACT_OCR_INSTALL_LANG_DATA + $(foreach langfile,$(TESSERACT_OCR_DATA_FILES), \ + $(INSTALL) -D -m 0644 $(DL_DIR)/$(langfile) \ + $(TARGET_DIR)/usr/share/tessdata/$(langfile) + ) +endef + +TESSERACT_OCR_POST_INSTALL_TARGET_HOOKS += TESSERACT_OCR_INSTALL_LANG_DATA + +$(eval $(autotools-package))