-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Implementation for TableParser #46
Conversation
Hello @nnilayy, Thank you for the pull request! I wanted to suggest a small enhancement for your development workflow. Have you considered integrating the Ruff linter into your IDE? It's a fantastic tool that can automatically resolve various issues upon saving a file. This could significantly streamline your coding process and reduce the time spent on manual corrections. For the current pull request, there's no need for you to take any action; I'll take care of applying the necessary fixes. However, it might be a valuable addition to future work. Warm regards, and happy coding! 🙌 |
Firstly, a big thank you for your dedication and hard work on the table parsing feature – it's a crucial part of our project at Alphanome AI! I've taken the liberty of refining a few minor points (with Pull Request #47) highlighted by the Ruff linter and other checks. For your reference, you can run these checks on your own using Unit Tests and Quality AssuranceEnsuring the quality of our parser is pivotal. We're focusing on introducing robust Unit Tests to achieve this. You can understand more about our testing philosophy here. Here's what I've added to guide our testing efforts:
sec-parser/tests/unit/semantic_elements/table_element/_data_for_table_parser.py Lines 1 to 5 in 46b5ff8
sec-parser/tests/unit/semantic_elements/table_element/test_table_parser.py Lines 11 to 30 in 46b5ff8
Could you review these test cases and adjust them if they don't align with your expectations for the output? Also, it would be great if you could add more diverse or edge-case examples to our test suite (you could do it just by adding more items to the Thank you for your continuous contributions and support. Your efforts are greatly valued in our community! |
Hey Elijas, As for the tests, I'll work on adding more edge case tests for the parser. As for the output, I do have something different in mind, as I think the parser could be refined even more to reduce data redundancy. As in this example, if we see the output data frame, for columns 2 & 3, the rows except $ signs are the same, so they could be merged, Ultimately leaving a 2-column data frame, the implementation for which is already in the parser. So yeah regarding the output and further refining the parser, I would love to discuss that with you in Discord. Again thank you so much for the feedback and for merging this pr. |
I forgot to add link to the refinement PR, in case you'd want to have a reference -- #47
No worries at all, Github provides with a very convenient "Squash and merge" functionality, which overwrites all PR commit history with 1 commit with a custom message. So for contributor convenience, we're just doing 1 PR = 1 commit 👍
Once again, no worries at all -- not having latest linters and stuff like that should never be a reason to re-consider creating a PR or delay submitting a contribution 🙌 Thanks again for the PR! 🎉 |
The pull request introduces a new
TableParser
module, which would help to parse theTableElement
objects and retrieve the parsed table data accordingly. To utilize it, follow these steps:TableElement
object.TableParser
by passing theTableElement
object to the constructor.parse
method to process the table.table_as_df
methodtable_as_json
methodExample:
![TableParser](https://private-user-images.githubusercontent.com/114939419/281966798-c860cd37-71df-4551-aaee-0f7437bfd7d7.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg4MzMxMzgsIm5iZiI6MTczODgzMjgzOCwicGF0aCI6Ii8xMTQ5Mzk0MTkvMjgxOTY2Nzk4LWM4NjBjZDM3LTcxZGYtNDU1MS1hYWVlLTBmNzQzN2JmZDdkNy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwNlQwOTA3MThaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xNmUwYzU4M2E2YzhjZWNmYzI5NDkwMzc0NDYwMzk5YTRjMzE2MzE0MDdhMTZlNzhmOTVlYmQ0OTI1OTdiMjU3JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.JgXU2Kfh9nA2WpzNErd8Z6R1UTAQ7ydA4HJPu8oMjBQ)