Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Add file-based metastore config to deployment.rst #24511

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

steveburnett
Copy link
Contributor

Description

Add how to configure Presto to use a file-based hms to installation/deployment.rst.

Motivation and Context

@ethanyzhang suggested this would be a good addition to the Presto documentation in an internal discussion that @nmahadevuni contributed the configuration in. I discussed where such information would fit best in the Presto documentation with @tdcmeehan.

Impact

Documentation. Readers wanting to try out Presto quickly can bypass the need for the steps in Configure Hive MetaStore.

Test Plan

Local doc build. Screenshot with existing text above and below included for context.
Screenshot 2025-02-06 at 4 41 07 PM

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add documentation for file-based Hive metastore to :doc:`/installation/deployment`.

@steveburnett steveburnett requested review from elharo and a team as code owners February 6, 2025 21:52
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Feb 6, 2025
@prestodb-ci prestodb-ci requested review from a team, psnv03 and NivinCS and removed request for a team February 6, 2025 21:52
@steveburnett steveburnett self-assigned this Feb 6, 2025
@github-actions github-actions bot added the docs label Feb 6, 2025
@majetideepak
Copy link
Collaborator

@steveburnett I have an issue here with some more details
#19112
We need to list the restrictions as well. file-based metastore does not support partitioning for example. @nmahadevuni should confirm

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For testing purposes Presto can be configured to use a local directory as a Hive
Metastore. Set the following properties in ``etc/config.properties``:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metastore concept applies only to Hive and Lakehouse Connectors (Iceberg, Delta, and Hudi), so we should probably mention it.

Additionally, these properties are catalog properties and must be added to etc/catalog/catalog_name.properties valid only for the above connectors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will do!

@imjalpreet
Copy link
Member

file-based meta store does not support partitioning

@majetideepak I haven't used the file-based Hive metastore much myself, but did you encounter any issues when trying to create partitioned tables?

Ideally, it should be possible and I can see that partition-specific metadata calls are implemented even for the FileHiveMetastore.

https://github.com/prestodb/presto/blob/master/presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/file/FileHiveMetastore.java

@ethanyzhang
Copy link
Contributor

@imjalpreet should this be in the hive.properties or config.properties?

@imjalpreet
Copy link
Member

should this be in the hive.properties or config.properties?

@ethanyzhang These should be in the catalog properties file. I also added a review comment above.

@steveburnett
Copy link
Contributor Author

steveburnett commented Feb 6, 2025

@steveburnett I have an issue here with some more details #19112 We need to list the restrictions as well. file-based metastore does not support partitioning for example. @nmahadevuni should confirm

Thanks for the additional information @majetideepak!

With the new information in #19112, I am going to move this topic from where I initially put it in this PR as a small topic in Deploying Presto.

I thought about moving it into the Hive Connector doc, but as it is relevant to "Hive and Lakehouse Connectors (Iceberg, Delta, and Hudi)" I think I will move it to its own page in /installation and include Deepak's instructions how to use it, which will be a big help to readers.

Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to include this in the connectors section?

Right now, metastore properties are spread across different parts of the Hive Connector documentation, mainly in Hive Configuration Properties, Metastore Configuration Properties, and Glue Configuration Properties.

Perhaps we could extract the metastore-related documentation into a separate subsection, similar to Hive Security. This would make sense since metastore properties are relevant not just to Hive, but also to other connectors like Iceberg, Delta, and Hudi.

@majetideepak
Copy link
Collaborator

Ideally, it should be possible and I can see that partition-specific metadata calls are implemented even for the FileHiveMetastore.

@imjalpreet I remember seeing some issue with partitions and file metastore. It could be due to my setup. I think its good to test this once before documenting.

@hantangwangd
Copy link
Member

I use file bases HMS for development environment, just confirmed again that it supports Hive partitioned tables. Or maybe you encountered some specific problems when using partitioned tables with file bases HMS @majetideepak?

Configuring a File-Based Metastore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For testing purposes Presto can be configured to use a local directory as a Hive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also mentioned that it is also very suitable for developing purpose?

@majetideepak
Copy link
Collaborator

@hantangwangd If you use it, then it's good. My issue was likely related to my setup.

hive.metastore.catalog.dir=file:///<catalog-dir>
Replace ``<catalog-dir>`` in the example with the path to a directory on the
local filesystem that is relative to the Presto installation directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path need not be relative to the installation directory. It can be any path on the local filesystem.

@nmahadevuni
Copy link
Member

@majetideepak I have used partitioned tables too recently with file based HMS. I didn't have any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs from:IBM PR from IBM
Projects
Status: 🆕 Unprioritized
Development

Successfully merging this pull request may close these issues.

7 participants