0
How to make it faster (C++)?
We have 2 (char) arrays with size 256 bytes; And third array with size 512 MB. All arrays initialized... arr1[256], arr2[256], arr3[536870912]; My code: for(int x=0; x < 536870912; x++){ for(int y = 0; y < 256; y++){ if(arr3[x] == arr2[y]){ arr3[x] = arr1[y]; break; } } } I use byte dictionary to change values in third array. If some element of arr3 equal some element of arr2 i change value arr3[some element] to value from arr1. If there are many elements, then the comparison takes too long. How can you speed up this code? I would be very grateful to you.
10 Antworten
+ 2
Josh Greig you can even further streamline your algorithm by preparing char rep[256]; so that rep[x] is character to which x is replaced:
//preparation
char rep[256];
for (int x=0; x<256; ++x) {
rep[x] = x;
}
for (int x = 255; x >= 0; --x) {
// going backwards, so that the first match is saved at the end
rep[(unsigned char)arr2[x]] = arr1[x];
}
//replacement
for (char& x: arr3) {
x = rep[(unsigned char)x];
}
+ 3
That algorithm needs to take at least O(arr3's length) steps but it doesn't need to be related to O(arr3's length * arr2's length).
The trick is to change data structure of arr2 so the index of the current character in arr2 can be retrieved without your nested loop.
The optimized algorithm below moves the data from arr2 into a new arr2i variable. This variable will be a much more efficient lookup of array indexes. You give the int version of any ASCII character and it'll tell you what index is needed to find it in arr2.
Try the code below:
char arr1[256], arr2[256], arr3[536870912];
int arr2i[256];
// TO DO: initialize those arrays the way you want.
// Store the indexes of every character code from 0 to 255 in arr2i.
// arr2i[y] will be -1 if the character y is not in arr2.
// Otherwise, arr2i[y] will be the index of character y in arr2.
// Store the indexes of every character code from 0 to 255 in arr2i.
// arr2i[y] will be -1 if the character y is not in arr2.
// Otherwise, arr2i[y] will be the index of character y in arr2.
for (int y = 0; y < 256; y++) {
arr2i[y] = -1; // indicate that (char)y is not in arr2
for (int t = 0; t < 256; t++) {
if (arr2[t] == (char)y) {
arr2i[y] = t;
break;
}
}
}
for(int x=0; x < 536870912; x++){
if(arr2i[(int)arr3[x]] >= 0){ // if found
arr3[x] = arr1[ arr2i[(int)arr3[x]] ]; // copy the character.
break;
}
}
I didn't test it other than seeing that it compiled. I tried on Sololearn but it looked like the 536MB variable took too much of their memory so it didn't actually run. It failed with a segmentation fault but I suspect it'll work on a machine that has plenty of RAM.
+ 2
Josh Greig even without any further assumptions about the arrays the previous code should work -- the case when a character is not in arr2 is taken into account by initial assignment rep[x] = x, the case when a character occurs several times in arr2 is taken into account by going through arr2 in backwards order -- at the end rep[x] == arr1[i], where i is the smallest index such that arr2[i] ==x (or rep[x] == x, if there's no such index i)
Влад Цислевский if bytes array is already in order (bytes[x] == x), then you can just use the translate array as rep (no need to search for the correct index):
for(int i = 0; i < buff_size; ++i) {
BUFF[i] = translate[(unsigned char)BUFF[i]];
}
Note that in any approach you might be replecing some character with 0 (or 0 at the end with some other character) so even if the initial array was a c string, you cannot treat it like a c string after the replacement (unless you make sure that translate[0] == 0, and it is the only occurence of 0 in translate array)
+ 2
Sorry, forgot that char may (and in most cases will) be signed.
Fixed that now.
Josh Greig are you sure converting to int helps -- as far as i understand it will preserve negative values (so it just silences the warning)?
+ 1
It sounds like Volodymyr's solution is best then.
0
Volodymyr that's a cleaner solution if we can assume that every character is in arr2. I wasn't sure if there were duplicated characters in arr2 which is why I represent -1 in my answer.
This sentence from the question leads me to think there may be duplicated characters in arr2 and that not all characters that are in arr3 exist in arr2:
"If some element of arr3 equal some element of arr2 i change value arr3[some element] to value from arr1."
He could have left out the if statement in the sentence if it was always true.
You could use an array of char* instead of arr2i even if arr2 contains duplicates and let NULL represent the case of a character not being found in arr2. The asker just seemed like enough of a beginner that an array of pointers would confuse more than the benefit of a slightly more efficient solution.
0
in array arr1 are all the existing bytes in the world. There are 256 of them. That is, any file can be assembled out of 256. There are any repeating elements. And there will be a coincidence in any case. in the code, the logic is as follows: in arr2 all bytes from \x00 to \xFF.
in arr1 translation of bytes.if byte in arr3 equals the byte in arr2 with index 3 then I assign a value from the array arr1 with index 3.
0
to make it easier I will describe it like this:
char * BUFF = new char [536870912]; // data from file
char bytes[256]; // values from \x00 to \x FF in order
char translate[256]; // values for translation (bytes in random order)
in the last two arrays, the values are never duplicated. in one all bytes from NULL (that is 00 in HEX) to FF.
if in BUFF some element is equal \ x00 and this is the first element in array bytes.
Then this element is from array BUFF will be equal to the first element from the array translate.
0
in array BUFF bytes can be repeated any number of times. And some may not be. I just compare each byte of data in a row and change the value to whatever is in another array with the same index.
0
Volodymyr Chelnokov, i compare buff and bytes to know what value I'm changing. I can't just take and replace all values without searching. The value that matches the first element of the array bytes will be replaced with the first value of the array translate. if the byte value from the buff will be equal to the second element, then we change the value to the value of the second element in the array translate.
for example
char translate [3] = { '\x53','\x79','\xfd'};
char bytes[3] = {'\x00','\x01','\x02'};
If the value in buff will be equal to 00 it will be replaced by 53.
If it is equal to 01 it will be replaced by 79.
If 02 it will be replaced by FD.
All this is done in HEX mode.
No numbers.