Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolve #22 Trim text before converting #23

Merged
merged 1 commit into from
Dec 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 28 additions & 11 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,55 @@
<h1 align=center>📜 Changelog - سجل التغيير</h1>
<p align=center>All notable changes to this project will be documented in this file.</p>

## [Unreleased](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/compare/v1.0.5...HEAD)

## [Unreleased](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/compare/v1.0.4...HEAD)
## [1.0.5](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/releases/tag/v1.0.5) - 2023-12-03 (19 Jumada al-awwal 1445)

### Updated

- Trim text before converting ([#22](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/22))

[Full Changelog](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/compare/v1.0.3...v1.0.4)

## [1.0.4](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/releases/tag/v1.0.4) - 2023-11-30 (16 Jumada al-awwal 1445)

### Updated
- rename `Panned` to `Banned`: which is the correct translation of `محظور` ([PR #19](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/pull/19))

- rename `Panned` to `Banned`: which is the correct translation of `محظور` ([PR #19](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/pull/19))

[Full Changelog](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/compare/v1.0.3...v1.0.4)

## [1.0.3](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/releases/tag/v1.0.3) - 2023-11-18 (04 Jumada al-awwal 1445)

### Added
- Convert To Old Arabic And Tashfeer Banned Words: Transform Arabic text into old script and replace Banned Arabic text with visually similar characters for encoding purposes. (Banned words are words that considered as hate speech in social media) ([#18](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/18))

- Convert To Old Arabic And Tashfeer Banned Words: Transform Arabic text into old script and replace Banned Arabic text with visually similar characters for encoding purposes. (Banned words are words that considered as hate speech in social media) ([#18](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/18))

[Full Changelog](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/compare/v1.0.2...v1.0.3)

## [1.0.2](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/releases/tag/v1.0.2) - 2023-11-18 (04 Jumada al-awwal 1445)

### Added
- Tashfeer Banned Words: Replaces Banned Arabic text with visually similar characters for encoding purposes. (Banned words are words that considered as hate speech in social media) ([#16](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/16))
- Remove Arabic Affixes: Removes predefined affixes (prefixes and suffixes) from an Arabic word if it starts or ends with those affixes. ([#17](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/17))

- Tashfeer Banned Words: Replaces Banned Arabic text with visually similar characters for encoding purposes. (Banned words are words that considered as hate speech in social media) ([#16](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/16))
- Remove Arabic Affixes: Removes predefined affixes (prefixes and suffixes) from an Arabic word if it starts or ends with those affixes. ([#17](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/17))

[Full Changelog](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/compare/v1.0.1...v1.0.2)

## [1.0.1](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/releases/tag/v1.0.1) - 2023-11-15 (Jumada al-awwal 1445)

### Added
- Word To Letters: Convert Arabic word to its pronounced letters. ([#2](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/2))

- Word To Letters: Convert Arabic word to its pronounced letters. ([#2](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/2))

[Full Changelog](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/compare/1.0.0...v1.0.1)

## [1.0.0](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/releases/tag/1.0.0) - 2023-11-07 (23 Rabi` al Thani 1445)

### Added
- Initial release of the library with the following functionalities:
- Tashkeel Removal: Easily remove Tashkeel from Arabic text. ([#12](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/12))
- Tatweel Removal: Remove Tatweel character from Arabic phrases. ([#8](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/8))
- Convert To Old Arabic: Transform Arabic text into old script. ([#9](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/9))
- Tashfeer: Replaces Arabic text with visually similar characters for encoding purposes. ([#13](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/13))

- Initial release of the library with the following functionalities:
- Tashkeel Removal: Easily remove Tashkeel from Arabic text. ([#12](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/12))
- Tatweel Removal: Remove Tatweel character from Arabic phrases. ([#8](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/8))
- Convert To Old Arabic: Transform Arabic text into old script. ([#9](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/9))
- Tashfeer: Replaces Arabic text with visually similar characters for encoding purposes. ([#13](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/13))
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "arabic-services",
"version": "1.0.4",
"version": "1.0.5",
"description": "Utility functions on Arabic text",
"main": "./dist/index.js",
"types": "./dist/index.d.ts",
Expand Down
22 changes: 15 additions & 7 deletions src/scripts/scripts.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,10 @@ import { setCharAt, similarityScore } from '../utils';
* Output: "الخيل والليل والبيداء تعرفني"
*/
export function removeTashkeel(text: string): string {
return text.replace(new RegExp('[' + TASHKEEL.join('') + ']', 'g'), '').replace(/ٱ/g, 'ا');
return text
.trim()
.replace(new RegExp('[' + TASHKEEL.join('') + ']', 'g'), '')
.replace(/ٱ/g, 'ا');
}

/**
Expand All @@ -33,7 +36,7 @@ export function removeTashkeel(text: string): string {
* Output: "الحىل واللىل والٮىدا ٮعرڡٮى"
*/
export function toOldArabic(sentence: string): string {
sentence = removeTashkeel(sentence);
sentence = removeTashkeel(sentence.trim());
let newSentence = '';
for (let letter = 0; letter < sentence.length; letter++) {
// if letter is not Arabic letter => append to newSentence
Expand Down Expand Up @@ -61,7 +64,8 @@ export function toOldArabic(sentence: string): string {

export function toOldArabicAndTashfeerBannedWords(sentence: string, levelOfTashfeer: number = 2): string {
let new_sentence = '';
for (const word of sentence.split(' ')) {
const words = sentence.trim().split(' ');
for (const word of words) {
if (checkIfBannedWord(word)) {
new_sentence += tashfeerHandler(word, levelOfTashfeer) + ' ';
} else {
Expand All @@ -80,7 +84,7 @@ export function toOldArabicAndTashfeerBannedWords(sentence: string, levelOfTashf
* Output: "رائع"
*/
export function removeTatweel(text: string): string {
return text.replace(/ـ/g, '');
return text.trim().replace(/ـ/g, '');
}

/**
Expand All @@ -89,18 +93,19 @@ export function removeTatweel(text: string): string {
* @returns {string} The word with pronounced letters separated by spaces.
*/
export function wordToLetters(word: string): string {
const trimmedWord = word.trim();
let newWord = '';

// Loop through each character in the input word
for (let i = 0; i < word.length; i++) {
const letter = word[i];
for (let i = 0; i < trimmedWord.length; i++) {
const letter = trimmedWord[i];

// Check if the current letter has a pronunciation in PRONOUNCED_LETTERS
if (PRONOUNCED_LETTERS.hasOwnProperty(letter)) {
newWord += PRONOUNCED_LETTERS[letter];

// Add a space after the pronounced letter unless it's the last letter in the word
if (i !== word.length - 1) {
if (i !== trimmedWord.length - 1) {
newWord += ' ';
}
} else {
Expand All @@ -121,6 +126,7 @@ export function wordToLetters(word: string): string {
* @returns {string} The word after removing any matching affixes. Returns the original word if no affix matches are found.
*/
export function removeArabicAffixes(word: string): string {
word = word.trim();
if (ARABIC_PREFIXES.includes(word.substring(0, 2))) {
// For: ALEF & LAM
word = setCharAt(word, 0, '');
Expand Down Expand Up @@ -263,6 +269,7 @@ function tashfeerHandler(word: string, level: number = 0): string {
* @returns {string} The encrypted sentence.
*/
export function tashfeer(sentence: string, levelOfTashfeer: number = 1): string {
sentence = sentence.trim();
let new_sentence = '';
for (const word of sentence.split(' ')) {
new_sentence += tashfeerHandler(word, levelOfTashfeer) + ' ';
Expand Down Expand Up @@ -308,6 +315,7 @@ function checkIfBannedWord(string: string): boolean {
*/
export function tashfeerBannedWords(sentence: string, levelOfTashfeer: number = 2): string {
let new_sentence = '';
sentence = sentence.trim();
for (const word of sentence.split(' ')) {
if (checkIfBannedWord(word)) {
new_sentence += tashfeerHandler(word, levelOfTashfeer) + ' ';
Expand Down
Loading