Segment manually the tabular structure before running the Layout Recognition to detect lines
Previous step: Manual Layout Recognition
If your documents present tables, the best current approach is to manually draw the tabular structure on the pages and then run the automatic line detection with the "Layout Recognition" option.
If the table layout of several pages is similar, copying and pasting the tabular structure from one page to another is possible.
First, open a page and select the "Add a Table" button on the left of the image. Click on the image once to start the table and once to finish it. Press ESC or Selection Mode to leave the "Add a Table" mode.
To create rows, select the table and hold H while moving the cursor across the page and clicking wherever you want to create a row.
To create columns, hold V while moving the cursor across the page and clicking wherever you want to create a column. Continue in this way until all the cells are marked.
If you need to merge two adjacent cells or all the cells of a row/column, select the shape and right-click: use the options appearing in the context menu to merge the cells.
Depending on the layout of your table, you might want to treat the book's spine like an extra column. You can also mark up this column with a structural tag (e.g. "book-binding" ): select the column, right-click and select "Assign structure type" in the context menu. Read the Structural Tag page for more information about managing and creating new structural tags.
If the page presents other information not belonging to the table (e.g. heading, page number, annotations…), draw text regions around them.
Often multiple pages follow the same table template. After drawing the tabular structure on the first page, select it and all the other text regions, press CTRL+C to copy the desired shapes, move to another page, and press CTRL+V to paste them. Some adjustments may be necessary: hold SHIFT to move and scale the shapes or move the cursor on the line you want to move, click on it and release it in the new position.
Once you have created the table structure on all the pages, go back to the Document view and run the Layout Recognition to add lines automatically. Remember to uncheck the "Find Text-Regions" box in the Configure Settings to detect only the lines.
If lines belonging to different cells are very close to each other, the automatic layout recognition may recognise them as one long line. To prevent this and make the lines strictly obey the cell border, check the "Split lines on region border" option in the Configure Settings of the layout recognition.
On the contrary, it could happen that lines stretching multiple cells are divided. You can merge those partial lines, but first, you must move them to the same cell. Open the layout tree with the "Layout" button on the left-side menu and select, in the image, the line that belongs to the wrong cell: automatically, it will highlight the corresponding line in the layout tree. Within the layout tree, move the highlighted line to the right cell (probably the previous or following cell). Now that both the lines belong to the same cell, you can hold CTRL, select both lines and press M on your keyboard to merge them.
Transkribus eXpert (deprecated)
Segmenting printed or hand-drawn tables using the Table Editor in Transkribus will add graphical lines to your image and assign a tabular structure to the layout of your documents.
Currently, tables must be manually drawn using the Table Editor in Transkribus. But if multiple pages follow the same table template, the table markup can be done on the first page and then copied to the remaining pages.
First, create text regions for any information not belonging to the table.
This refers to information at the top, bottom or sides of the page which is clearly not part of the table, such as page numbers, line numbers, dates and any other markings or annotations.
Then, you can create the table. In the Canvas Menu, select the “Add other item” button and then click “Add a table.” Click on the top left corner of the table in the image and then click on the bottom right corner
You can now segment your table into rows and columns. To begin, make sure you are in “Selection mode:” press the “ESC” key on your keyboard or click the “Selection mode” button in the Main Menu. Click on the table that you have created.
To create rows, click the H-button in the Canvas Menu: move your cursor across the page and click wherever you want to create a horizontal line.
To create columns, click the V-button in the Canvas Menu: move your cursor across the page and click wherever you want to create a vertical line. Continue until all table cells are marked.
In some cases, it may be necessary to merge cells together in order to reflect cells spanning multiple rows or columns. To select cells to merge, hold down the “CTRL/CMD” key on your keyboard, click on the relevant cells in your table and then click the “Merges the selected shapes” button in the Canvas Menu.
If you focus on having the perfect table segmentation, correcting the shapes of some of the cells in your table may also be necessary. The segmented green lines should then correspond to the lines of your table as far as possible. In order to do so, select the table cell you wish to edit, click and drag the green dots to move the position of the lines.
Depending on the layout of your table, you might want to treat the spine of the book like an extra column. You can also mark up this column on the table cell level using the “book-binding” tag in the “Metadata/Structural” tab.
If the table layout of several pages is similar, it is possible to transfer the table format from one page to others. To do this, open “Other segmentation tools” in the Canvas Menu; choose “Copy regions (texts or tables) to other pages;” define the pages the layout should be copied to in the appearing window and confirm with “OK”. The table layout will be copied to the indicated pages. To definitely run the tool, unselect “Dry run”. It might be that the position of the table on the new pages will need to be correct. To do so, select the whole table and then move it by holding the CTRL + SHIFT on your keyboard.
Before manually or automatically transcribing the table, the next step is adding baselines. The baselines should reflect the logical flow of text and can therefore run over the cell borders if necessary. You can either draw the baselines by hand or use the automatic Layout Analysis tool.
You may find that the automatic layout tool on table cells strictly obeys the cell borders. Baselines stretching multiple cells are divided. You can use the merging tool to combine those partial baselines. In case you want to merge baselines stretching more than one cell, you need to move them first to the same cell, select them and use the merging tool. In more detail, open the “Layout” tab in the Tools&Managing Bar, and select, in the image, the line that belongs to the wrong cell: automatically, it will highlight the corresponding line in the layout tree. Within the layout tree, move the highlighted line to the right cell (probably the previous or following cell). Now that both the lines belong to the same cell, you can select both and click the “Merges the selected shapes” button in the Canvas Menu.