Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 fix knowledge king oom bug #150

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

w305jack
Copy link
Collaborator

@w305jack w305jack commented Jun 28, 2023

Why need this change? / Root cause:

  • Fix the OOM error in Knowledge King

knowledge-king-oom-error-log

Changes made:

Test Scope / Change impact:

  • tw/waterballsa/utopia/chatgpt/JTokkit.kt

knowledge-king-before-after-impact

Note
The "Before" image is from a nice spec VM, just for intended to obtain memory usage information

Summary by OpenAI

Release Notes:

  • New Feature: Upgraded jtokkit version from 0.4.0 to 0.5.1.
  • New Feature: Added lazy encoding feature to JTokkit class.
  • Refactor: Initialized encoding registry and encoding using a companion object in JTokkit class.
  • Refactor: Updated measureNumOfTokens function to use the countTokens method of the encoding object.

"Upgraded jtokkit,
Lazy encoding now a hit.
With a companion object,
Refactored for better effect.
Counting tokens with ease,
Code changes bring us peace."

w305jack added 2 commits June 28, 2023 15:57
- upgrade jtokkit version from 0.4.0 to 0.5.0
- use lazy load register that the new method of the Encodings class,
which can resolve the issue of initializing all encoding types problem
@github-actions
Copy link

github-actions bot commented Jun 28, 2023

🤖 OpenAI

High-level Summary

The code changes in this pull request involve upgrading the jtokkit library from version 0.4.0 to 0.5.1 and utilizing a new feature for lazy encoding. The JTokkit class has been modified to use a companion object for initializing the encoding registry and encoding. Additionally, the measureNumOfTokens function now utilizes the countTokens method of the encoding object. These changes aim to improve performance and maintainability by leveraging the updated library features.

File Changes

File Summary
chatgpt-api/src/main/kotlin/tw/waterballsa/utopia/chatgpt/JTokkit.kt Upgraded jtokkit version to 0.5.1. Modified JTokkit class to use a companion object for initializing encoding registry and encoding.
pom.xml Upgraded jtokkit version to 0.5.1.

Please review the specific file changes for more detailed information on each modification.


Chat with 🤖 OpenAI Bot (@openai)

  • Reply on review comments left by this bot to ask follow-up questions. A review comment is a comment on a diff or a file.
  • Invite the bot into a review comment chain by tagging @openai in a reply.

Code suggestions

  • The bot may make code suggestions, but please review them carefully before committing since the line number ranges may be misaligned.
  • You can edit the comment made by the bot and manually tweak the suggestion if it is slightly off.

Ignoring further reviews

  • Type @openai: ignore anywhere in the PR description to ignore further reviews from the bot.


In the recent run, only the files that changed from the base of the PR and between 90166edab7d511e43a394ada74aa5776b3abcec4 and 0b62dd7b99045eac3af780654fd4af80eb20bc87 commits were reviewed.

Comment on lines 20 to 22
fun measureNumOfTokens(string: String): Int {
val encoding: Encoding = registry.getEncoding(encodingType)
val encoded: List<Int> = encoding.encode(string)
return encoded.size
return encoding.countTokens(string)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 OpenAI

The measureNumOfTokens function has been modified to use the countTokens method of the encoding object. This change improves the efficiency of token counting by utilizing the built-in functionality provided by the jtokkit library.

Suggested change
fun measureNumOfTokens(string: String): Int {
val encoding: Encoding = registry.getEncoding(encodingType)
val encoded: List<Int> = encoding.encode(string)
return encoded.size
return encoding.countTokens(string)
}
fun measureNumOfTokens(string: String): Int {
return encoding.countTokens(string)
}

Wally5077
Wally5077 previously approved these changes Jun 28, 2023
Copy link
Collaborator

@Wally5077 Wally5077 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 20 to 22
fun measureNumOfTokens(string: String): Int {
val encoding: Encoding = registry.getEncoding(encodingType)
val encoded: List<Int> = encoding.encode(string)
return encoded.size
return encoding.countTokens(string)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fun measureNumOfTokens(string: String): Int = encoding.countTokens(string)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修正

Comment on lines 11 to 18
companion object {
private val registry = Encodings.newLazyEncodingRegistry()
private lateinit var encoding: Encoding
}

init {
encoding = registry.getEncodingForModel(ModelType.GPT_3_5_TURBO)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q:這裡不能直接初始化嗎?

companion object {
        private val registry = Encodings.newLazyEncodingRegistry()
        private val encoding = registry.getEncodingForModel(ModelType.GPT_3_5_TURBO)
    }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以直接初始化,已修

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants