Skip to content

Commit

Permalink
Merge pull request #23 from Seen-Arabic/22-trim-text-before-converting
Browse files Browse the repository at this point in the history
resolve #22 Trim text before converting
  • Loading branch information
AbdelrahmanBayoumi authored Dec 3, 2023
2 parents c83d43f + 0572c01 commit 4b9c2ab
Show file tree
Hide file tree
Showing 3 changed files with 44 additions and 19 deletions.
39 changes: 28 additions & 11 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,55 @@
<h1 align=center>📜 Changelog - سجل التغيير</h1>
<p align=center>All notable changes to this project will be documented in this file.</p>

## [Unreleased](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/compare/v1.0.5...HEAD)

## [Unreleased](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/compare/v1.0.4...HEAD)
## [1.0.5](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/releases/tag/v1.0.5) - 2023-12-03 (19 Jumada al-awwal 1445)

### Updated

- Trim text before converting ([#22](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/22))

[Full Changelog](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/compare/v1.0.3...v1.0.4)

## [1.0.4](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/releases/tag/v1.0.4) - 2023-11-30 (16 Jumada al-awwal 1445)

### Updated
- rename `Panned` to `Banned`: which is the correct translation of `محظور` ([PR #19](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/pull/19))

- rename `Panned` to `Banned`: which is the correct translation of `محظور` ([PR #19](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/pull/19))

[Full Changelog](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/compare/v1.0.3...v1.0.4)

## [1.0.3](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/releases/tag/v1.0.3) - 2023-11-18 (04 Jumada al-awwal 1445)

### Added
- Convert To Old Arabic And Tashfeer Banned Words: Transform Arabic text into old script and replace Banned Arabic text with visually similar characters for encoding purposes. (Banned words are words that considered as hate speech in social media) ([#18](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/18))

- Convert To Old Arabic And Tashfeer Banned Words: Transform Arabic text into old script and replace Banned Arabic text with visually similar characters for encoding purposes. (Banned words are words that considered as hate speech in social media) ([#18](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/18))

[Full Changelog](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/compare/v1.0.2...v1.0.3)

## [1.0.2](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/releases/tag/v1.0.2) - 2023-11-18 (04 Jumada al-awwal 1445)

### Added
- Tashfeer Banned Words: Replaces Banned Arabic text with visually similar characters for encoding purposes. (Banned words are words that considered as hate speech in social media) ([#16](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/16))
- Remove Arabic Affixes: Removes predefined affixes (prefixes and suffixes) from an Arabic word if it starts or ends with those affixes. ([#17](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/17))

- Tashfeer Banned Words: Replaces Banned Arabic text with visually similar characters for encoding purposes. (Banned words are words that considered as hate speech in social media) ([#16](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/16))
- Remove Arabic Affixes: Removes predefined affixes (prefixes and suffixes) from an Arabic word if it starts or ends with those affixes. ([#17](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/17))

[Full Changelog](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/compare/v1.0.1...v1.0.2)

## [1.0.1](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/releases/tag/v1.0.1) - 2023-11-15 (Jumada al-awwal 1445)

### Added
- Word To Letters: Convert Arabic word to its pronounced letters. ([#2](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/2))

- Word To Letters: Convert Arabic word to its pronounced letters. ([#2](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/2))

[Full Changelog](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/compare/1.0.0...v1.0.1)

## [1.0.0](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/releases/tag/1.0.0) - 2023-11-07 (23 Rabi` al Thani 1445)

### Added
- Initial release of the library with the following functionalities:
- Tashkeel Removal: Easily remove Tashkeel from Arabic text. ([#12](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/12))
- Tatweel Removal: Remove Tatweel character from Arabic phrases. ([#8](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/8))
- Convert To Old Arabic: Transform Arabic text into old script. ([#9](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/9))
- Tashfeer: Replaces Arabic text with visually similar characters for encoding purposes. ([#13](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/13))

- Initial release of the library with the following functionalities:
- Tashkeel Removal: Easily remove Tashkeel from Arabic text. ([#12](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/12))
- Tatweel Removal: Remove Tatweel character from Arabic phrases. ([#8](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/8))
- Convert To Old Arabic: Transform Arabic text into old script. ([#9](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/9))
- Tashfeer: Replaces Arabic text with visually similar characters for encoding purposes. ([#13](https://github.com/Seen-Arabic/Arabic-Services-JavaScript/issues/13))
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "arabic-services",
"version": "1.0.4",
"version": "1.0.5",
"description": "Utility functions on Arabic text",
"main": "./dist/index.js",
"types": "./dist/index.d.ts",
Expand Down
22 changes: 15 additions & 7 deletions src/scripts/scripts.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,10 @@ import { setCharAt, similarityScore } from '../utils';
* Output: "الخيل والليل والبيداء تعرفني"
*/
export function removeTashkeel(text: string): string {
return text.replace(new RegExp('[' + TASHKEEL.join('') + ']', 'g'), '').replace(/ٱ/g, 'ا');
return text
.trim()
.replace(new RegExp('[' + TASHKEEL.join('') + ']', 'g'), '')
.replace(/ٱ/g, 'ا');
}

/**
Expand All @@ -33,7 +36,7 @@ export function removeTashkeel(text: string): string {
* Output: "الحىل واللىل والٮىدا ٮعرڡٮى"
*/
export function toOldArabic(sentence: string): string {
sentence = removeTashkeel(sentence);
sentence = removeTashkeel(sentence.trim());
let newSentence = '';
for (let letter = 0; letter < sentence.length; letter++) {
// if letter is not Arabic letter => append to newSentence
Expand Down Expand Up @@ -61,7 +64,8 @@ export function toOldArabic(sentence: string): string {

export function toOldArabicAndTashfeerBannedWords(sentence: string, levelOfTashfeer: number = 2): string {
let new_sentence = '';
for (const word of sentence.split(' ')) {
const words = sentence.trim().split(' ');
for (const word of words) {
if (checkIfBannedWord(word)) {
new_sentence += tashfeerHandler(word, levelOfTashfeer) + ' ';
} else {
Expand All @@ -80,7 +84,7 @@ export function toOldArabicAndTashfeerBannedWords(sentence: string, levelOfTashf
* Output: "رائع"
*/
export function removeTatweel(text: string): string {
return text.replace(/ـ/g, '');
return text.trim().replace(/ـ/g, '');
}

/**
Expand All @@ -89,18 +93,19 @@ export function removeTatweel(text: string): string {
* @returns {string} The word with pronounced letters separated by spaces.
*/
export function wordToLetters(word: string): string {
const trimmedWord = word.trim();
let newWord = '';

// Loop through each character in the input word
for (let i = 0; i < word.length; i++) {
const letter = word[i];
for (let i = 0; i < trimmedWord.length; i++) {
const letter = trimmedWord[i];

// Check if the current letter has a pronunciation in PRONOUNCED_LETTERS
if (PRONOUNCED_LETTERS.hasOwnProperty(letter)) {
newWord += PRONOUNCED_LETTERS[letter];

// Add a space after the pronounced letter unless it's the last letter in the word
if (i !== word.length - 1) {
if (i !== trimmedWord.length - 1) {
newWord += ' ';
}
} else {
Expand All @@ -121,6 +126,7 @@ export function wordToLetters(word: string): string {
* @returns {string} The word after removing any matching affixes. Returns the original word if no affix matches are found.
*/
export function removeArabicAffixes(word: string): string {
word = word.trim();
if (ARABIC_PREFIXES.includes(word.substring(0, 2))) {
// For: ALEF & LAM
word = setCharAt(word, 0, '');
Expand Down Expand Up @@ -263,6 +269,7 @@ function tashfeerHandler(word: string, level: number = 0): string {
* @returns {string} The encrypted sentence.
*/
export function tashfeer(sentence: string, levelOfTashfeer: number = 1): string {
sentence = sentence.trim();
let new_sentence = '';
for (const word of sentence.split(' ')) {
new_sentence += tashfeerHandler(word, levelOfTashfeer) + ' ';
Expand Down Expand Up @@ -308,6 +315,7 @@ function checkIfBannedWord(string: string): boolean {
*/
export function tashfeerBannedWords(sentence: string, levelOfTashfeer: number = 2): string {
let new_sentence = '';
sentence = sentence.trim();
for (const word of sentence.split(' ')) {
if (checkIfBannedWord(word)) {
new_sentence += tashfeerHandler(word, levelOfTashfeer) + ' ';
Expand Down

0 comments on commit 4b9c2ab

Please sign in to comment.